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10 

Title Of The Invention 

NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
CANDIDA ALBICANS FOR DIAGNOSTICS AND THERAPEUTICS 

15 

Cross-Reference to Related Applications 

This application is converted from U.S. provisional application Serial Number 
60/074,725, filed February 13, 1998 and U.S. provisional application Serial Number 
20 60/096,409 filed August 13, 1998. 

Field Of The Invention 

The invention relates to isolated nucleic acids and polypeptides derived from 
Candida albicans that are useful as molecular targets for diagnostics, prophylaxis and 
25 treatment of pathological conditions, as well as materials and methods for the 

diagnosis, prevention, and amelioration of pathological conditions resulting from 
fungal infection. 

Background Of The Invention 
30 Candida albicans is a dimorphic fungus which has both a yeast-like growth 

habit and a filamentous form consisting of both hyphae and pseudohypae. The fungus 
is a member of the normal surface flora of most individuals. Although no sexual state 
has been described for C albicans, the genome is diploid in most strains (Whelan, 
WL et al. (1980) Mol. Gen. Genet. 180: 107-1 13; Whelan, WL and Magee, PT (1981) 
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J. Bacteriol 145: 896-903; Poulter, R. (1982) J. Bacteriol 152: 969-975) and 
rearranges relatively frequently (Rustchenko-Bulgac EP, et al (1990) J Bacteriol 172 : 
1276-1283; Barton, RC and Scherer, S (1994)/. Bacteriol . 176: 756-763). In addition, 
one non-universal decoding is known in which a leucine codon (CUG) is translated as 
5 a serine (Leuker et al. (1994), Mol Gen. Genet. 245: 212-217; Santos et al., (1993) 
EMBO Journal 12:607-616). This creates difficulties in the application of the 
powerful genetic and molecular methods used in Saccharomyces and 
Schizosaccharomyces. 

C. albicans exists as part of the normal microbial flora in humans, but can 

10 produce opportunistic infections ranging from topical infections such as oral thrush to 
life-threatening disseminated mycoses (Ampel, NM (1996) Emerg. Infect. Dis. 2: 109- 
1 16). Candida is a major cause of nosocomial infections and was found to account for 
more than 75% of all fungal nosocomial infections reported by NNIS (National 
Nosocomial Infections Surveillance) hospitals from 1980-1990 in which fungi alone 

15 accounted for 7.9 % of all nosocomial infections (Beck-Sagu, CM and Jarvis, WR 
(1993) J. Infect. Dis. 167: 1247-1251). Although the source of Candida in infections 
is frequently traced to endogenous sources on the patient, it has also been traced to 
exogenous sources in the hospital environment including contaminated solutions and 
equipment (Shetertz, RJ et al. (1992) J. PediatrMO: 455-461; Weems, JJ et al. (1987) 

20 J. Clin. Mcirobiol. 1925 : 1029-1032), and health care workers (Hunter, PR et al 
(1990) J. Med Vet Mycol. 28: 317-325; Burnie, JP (1986) J. Hosp. Infect. 8: 1-4; 
Doebbeling, BN et al. (1991) J. Clin. Microbiol. 29: 1268-1270). Numerous 
investigations into the molecular basis of pathogenicity have been made implicating 
the hyphal form (Lo, HJ et al. (1997) Cell 90:939-949), surface molecules including 

25 adhesins (Fukazawa Y and Kagaya K (1997)J Med Vet Mycol 35:87-99), and ATP- 
binding cassette-containing multi-drug resistance proteins (Prasad, R et al.(1995) 
Curr. Genet. 27: 320-329). 

The antimicrobials currently in use against Candida are generally of three 
types: azoles, such as fluconazole, itraconazole, and clotrimazole; polyenes, such as 

30 amphotericin B and nystatin; and 5-fluorocytosine. However, invasive infections are 
treated primarily with fluconazole, amphotericin B, and 5-fluorocytosine, although the 
latter two compounds have significant toxic side effects. The development of 
resistance to fluconazole by C. albicans has been noted by a number of researchers 
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(Redding, S (1994) Clin Infect. Dis. 18: 339-346; Sargeorzan, JA (1994) Am. J. Med. 
97: 339-346; Revankar, SG et al. (1996) J. Infect. Dis. 174: 821-827; Marr, KA et al. 
(1997) Clin. Infect. Dis 25: 908-910). Relatively short treatments seem to result in 
few if any resistant isolates, but extended treatments including prophylactic 
5 treatments such as are required among immunocompromised and AIDS patients, 
result in the appearance of fluconazole-resistant strains (Johnson, EM (\995) J. 
Antimicrob. Chemother. 35: 103-114). Development of fluconazole-resistance has 
been observed to be associated with the development of amphotericin-resistance 
(Vazquez, JA (1996) Antimicrob. Agents Chemother. 40: 251 1-2516; Nolte, FS et al. 

10 (1997) Antimicrob. Agents Chemother. 41_: 196-199; White, TC (1997) ASM News 63: 
427-433) consistent with the action of both drugs on ergosterol in the membrane. 

The difficulty in diagnosing C. albicans infections, the limited spectrum of 
current therapeutic drugs and the development of drug resistant strains of C albicans 
provide the rationale for the identification of targets for more rapid and effective 

1 5 methods of identification, prevention, and treatment of candidiasis. The elucidation 
of the genome of C. albicans would enhance the understanding of how C. albicans, as 
well as other fungi, causes invasive disease and how best to combat fiingal infection. 

Summary Of The Invention 

20 The present invention fulfills the need for diagnostic tools and therapeutics by 

providing fungal-specific compositions and methods for detecting, treating, and 
preventing fungal infection, in particular C. albicans infection. They also have use as 
biocontrol agents for plants. 

The present invention encompasses isolated nucleic acids and polypeptides 

25 derived from C. albicans that are useful as reagents for diagnosis of fungal disease, 
components of effective antifungal vaccines, and/or as targets for antifungal drugs 
including anti-C. albicans drugs. They can also be used to detect the presence of C 
albicans and other Candida species in a sample; and in screening compounds for the 
ability to interfere with the C. albicans life cycle or to inhibit C albicans infection. 

30 More specifically, this invention features compositions of nucleic acids 

corresponding to entire coding sequences of C. albicans proteins, including surface or 
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secreted proteins or parts thereof, nucleic acids capable of binding mRNA from C. 
albicans proteins to block protein translation, and methods for producing C. albicans 
proteins or parts thereof using peptide synthesis and recombinant DNA techniques. 
This invention also features antibodies and nucleic acids useful as probes to detect C. 
5 albicans infection. In addition, vaccine compositions and methods for the protection 
or treatment of infection by C. albicans are within the scope of this invention. 

The nucleotide sequences provided in SEQ ID NO: 1 - SEQ ID NO: 14103, a 
fragment thereof, or a nucleotide sequence at least about 99.5% identical to a 
sequence contained within SEQ ID NO: 1 - SEQ ID NO: 14103 may be "provided" in 

10 a variety of medias to facilitate use thereof. As used herein, "provided" refers to a 
manufacture, other than an isolated nucleic acid molecule, which contains a 
nucleotide sequence of the present invention, i.e., the nucleotide sequence provided in 
SEQ ID NO: 1 - SEQ ID NO: 14103, a fragment thereof, or a nucleotide sequence at 
least about 99.5% identical to a sequence contained within SEQ ID NO: 1 - SEQ ID 

1 5 NO: 14103. Uses for and methods for providing nucleotide sequences in a variety of 
media is well known in the art (see e.g., EPO Publication No. EP 0 756 006). 

In one application of this embodiment, a nucleotide sequence of the present 
invention can be recorded on computer readable media. As used herein, "computer 
readable media" refers to any media which can be read and accessed directly by a 

20 computer. Such media include, but are not limited to: magnetic storage media, such 
as floppy discs, hard disc storage media, and magnetic tape; optical storage media 
such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of 
these categories such as magnetic/optical storage media. A person skilled in the art 
can readily appreciate how any of the presently known computer readable media can 

25 be used to create a manufacture comprising computer readable media having recorded 
thereon a nucleotide sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable media. A person skilled in the art can readily adopt any of the 
presently known methods for recording information on computer readable media to 

30 generate manufactures comprising the nucleotide sequence information of the present 
invention. 
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A variety of data storage structures are available to a person skilled in the art 
for creating a computer readable media having recorded thereon a nucleotide 
sequence of the present invention. The choice of the data storage structure will 
generally be based on the means chosen to access the stored information. In addition, 
5 a variety of data processor programs and formats can be used to store the nucleotide 
sequence information of the present invention on computer readable media. The 
sequence information can be represented in a word processing text file, formatted in 
commercially-available software such as WordPerfect and Microsoft Word, or 
represented in the form of an ASCII file, stored in a database application, such as 

10 DB2, Sybase, Oracle, or the like. A person skilled in the art can readily adapt any 
number of data processor structuring formats (e.g. text file or database) in order to 
obtain computer readable media having recorded thereon the nucleotide sequence 
information of the present invention. 

By providing the nucleotide sequence of SEQ ID NO: 1 - SEQ ID NO: 14103, 

15 a fragment thereof, or a nucleotide sequence at least about 99.5% identical to SEQ ID 
NO: I - SEQ ID NO: 14103 in computer readable form, a person skilled in the art can 
routinely access the coding sequence information for a variety of purposes. Computer 
software is publicly available which allows a person skilled in the art to access 
sequence information provided in a computer readable media. Examples of such 

20 computer software include programs of the "Staden Package", "DNA Star", 

"MacVector", GCG "Wisconsin Package" (Genetics Computer Group, Madison, WI) 
and "NCBI Toolbox" (National Center For Biotechnology Information). Suitable 
programs are described, for example, in Martin J. Bishop, ed., Guide to Human 
Genome Computing, 2d Edition, Academic Press, San Diego, CA. (1998); and 

25 Leonard F. Peruski, Jr., and Anne Harwood Peruski, The Internet and the New 
Biology: Tools for Genomic and Molecular Research, American Society for 
Microbiology, Washington, D.C. (1997). 

Computer algorithms enable the identification of C. albicans open reading 
frames (ORFs) within SEQ ID NO: 1 - SEQ ID NO: 14103 which contain homology 

30 to ORFs or proteins from other organisms. Examples of such similarity-search 

algorithms include the BLAST [Altschul et al., J. Mol. Biol. 215:403-410 (1990)] and 
Smith-Waterman [Smith and Waterman (1981) Advances in Applied Mathematics, 
2:482-489] search algorithms. Suitable search algorithms are described, for example, 
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in Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, Academic 
Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne Harwood 
Peruski, The Internet and the New Biology: Tools for Genomic and Molecular 
Research, American Society for Microbiology, Washington, D.C. (1997). Such 
5 algorithms are utilized on computer systems as exemplified below. The ORFs so 
identified represent protein encoding fragments within the C. albicans genome and 
are useful in producing commercially important proteins such as enzymes used in 
fermentation reactions and in the production of commercially useful metabolites. 

The present invention further provides systems, particularly computer-based 

10 systems, which contain the sequence information described herein. Such systems are 
designed to identify commercially important fragments of the C. albicans genome. 
As used herein, "a computer-based system" refers to the hardware means, software 
means, and data storage means used to analyze the nucleotide sequence information 
of the present invention. The minimum hardware means of the computer-based 

15 systems of the present invention comprises a central processing unit (CPU), input 
means, output means, and data storage means. A person skilled in the art can readily 
appreciate that any one of the currently available computer-based systems is suitable 
for use in the present invention. The computer-based systems of the present invention 
comprise a data storage means having stored therein a nucleotide sequence of the 

20 present invention and the necessary hardware means and software means for 

supporting and implementing a search means. As used herein, "data storage means" 
refers to memory which can store nucleotide sequence information of the present 
invention, or a memory access means which can access manufactures having recorded 
thereon the nucleotide sequence information of the present invention. 

25 As used herein, "search means" refers to one or more programs which are 

implemented on the computer-based system to compare a target sequence or target 
structural motif with the sequence information stored within the data storage means. 
Search means are used to identify fragments or regions of the C. albicans 
genomewhich are similar to, or "match", a particular target sequence or target motif. 

30 A variety of known algorithms are known in the art and have been disclosed publicly, 
and a variety of commercially available software for conducting homology-based 
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similarity searches are available and can be used in the computer-based systems of the 
present invention. Examples of such software include, but are not limited to, FASTA 
(GCG Wisconsin Package), Bic SW (Compugen Bioccelerator), BLASTN2, 
BLASTP2, BLASTX2 (NCBI) and Motifs (GCG). Suitable software programs are 
5 described, for example, in Martin J. Bishop, ed., Guide to Human Genome 

Computing, 2d Edition, Academic Press, San Diego, CA. (1998); and Leonard F. 
Peruski, Jr., and Anne Harwood Peruski, The Internet and the New Biology: Tools 
for Genomic and Molecular Research, American Society for Microbiology, 
Washington, D.C. (1997). A person skilled in the art will readily recognize that any 

10 one of the available algorithms or implementing software packages for conducting 
homology searches can be adapted for use in the present computer- based systems. 

As used herein, a "target sequence" can be any DNA or amino acid sequence 
of six or more nucleotides or two or more amino acids. A person skilled in the art can 
readily recognize that the longer a target sequence is, the less likely a target sequence 

15 will be present as a random occurrence in the database. The most preferred sequence 
length of a target sequence is from about 1 0 to 100 amino acids or from about 30 to 
300 nucleotide residues. However, it is well recognized that many genes are longer 
than 500 amino acids, or 1 .5 kb in length, and that commercially important fragments 
of the C. albicans genome, such as sequence fragments involved in gene expression 

20 and protein processing, will often be shorter than 30 nucleotides. 

As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) are 
chosen based on a specific functional domain or three-dimensional configuration 
which is formed upon the folding of the target polypeptide. There is a variety of target 

25 motifs known in the art. Protein target motifs include, but are not limited to, 

enzymatic active sites, membrane-spanning regions, and signal sequences. Nucleic 
acid target motifs include, but are not limited to, promoter sequences, hairpin 
structures and inducible expression elements (protein binding sequences). 

A variety of structural formats for the input and output means can be used to 

30 input and output the information in the computer-based systems of the present 

invention. A preferred format for an output means ranks fragments of the C albicans 
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genome possessing varying degrees of homology to the target sequence or target 
motif. Such presentation provides a person skilled in the art with a ranking of 
sequences which contain various amounts of the target sequence or target motif and 
identifies the degree of homology contained in the identified fragment. 
5 A variety of comparing means can be used to compare a target sequence or 

target motif with the data storage means to identify sequence fragments of the C. 
albicans genome. In the present examples, implementing software which implement 
the BLASTP2 and bic_SW algorithms (Altschul et al., J Mol. Biol. 215:403-410 
(1990); Compugen Biocellerator) was used to identify open reading frames within the 

10 C. albicans genome. A person skilled in the art can readily recognize that any one of 
the publicly available homology search programs can be used as the search means for 
the computer- based systems of the present invention. Suitable programs are 
described, for example, in Martin J. Bishop, ed., Guide to Human Genome 
Computing, 2d Edition, Academic Press, San Diego, CA. (1998); and Leonard F. 

1 5 Peruski, Jr., and Anne Harwood Peruski, The Internet and the New Biology: Tools 
for Genomic and Molecular Research, American Society for Microbiology, 
Washington, D.C. (1997). 

The invention features C. albicans polypeptides, preferably a substantially 
pure preparation of an C albicans polypeptide, or a recombinant C. albicans 

20 polypeptide. In preferred embodiments: the polypeptide has biological activity; the 
polypeptide has an amino acid sequence at least about 60%, 70%, 80%, 90%, 95%, 
98%, or 99% identical to an amino acid sequence of the invention contained in the 
Sequence Listing, preferably it has about 65% sequence identity with an amino acid 
sequence of the invention contained in the Sequence Listing, and most preferably it 

25 has about 92% to about 99% sequence identity with an amino acid sequence of the 
invention contained in the Sequence Listing; the polypeptide has an amino acid 
sequence essentially the same as an amino acid sequence of the invention contained in 
the Sequence Listing; the polypeptide is at least about 5, 10, 20, 50, 100, or 150 
amino acid residues in length; the polypeptide includes at least about 5, preferably at 

30 least about 10, more preferably at least about 20, more preferably at least about 50, 
100, or 150 contiguous amino acid residues of the invention contained in the 
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Sequence Listing. In yet another preferred embodiment, the amino acid sequence 
which differs in sequence identity by about 7% to about 8% from the C. albicans 
amino acid sequences of the invention contained in the Sequence Listing is also 
encompassed by the invention. 
5 In preferred embodiments: the C. albicans polypeptide is encoded by a 

nucleic acid of the invention contained in the Sequence Listing, or by a nucleic acid 
having at least about 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with a 
nucleic acid of the invention contained in the Sequence Listing. 

In a preferred embodiment, the subject C. albicans polypeptide differs in 
10 amino acid sequence at about 1, 2, 3, 5, 10 or more residues from a sequence of the 
invention contained in the Sequence Listing. The differences, however, are such that 
the C. albicans polypeptide exhibits an C. albicans biological activity, e.g., the C. 
albicans polypeptide retains a biological activity of a naturally occurring C. albicans 
enzyme. 

15 In preferred embodiments, the polypeptide includes all or a fragment of an 

amino acid sequence of the invention contained in the Sequence Listing; fused, in 
reading frame, to additional amino acid residues, preferably to residues encoded by 
genomic DNA 5' or 3' to the genomic DNA which encodes a sequence of the 
invention contained in the Sequence Listing. 

20 In yet other preferred embodiments, the C. albicans polypeptide is a 

recombinant fusion protein having a first C. albicans polypeptide portion and a 
second polypeptide portion, e.g., a second polypeptide portion having an amino acid 
sequence unrelated to C. albicans. The second polypeptide portion can be, e.g., any 
of glutathione-S-transferase, a DNA binding domain, or a polymerase activating 

25 domain. In preferred embodiment the fusion protein can be used in a two-hybrid 
assay. 

Polypeptides of the invention include those which arise as a result of 
alternative transcription events, alternative RNA splicing events, and alternative 
translational and postranslational events. 
30 In a preferred embodiment, the encoded C. albicans polypeptide differs (e.g., 

by amino acid substitution, addition or deletion of at least one amino acid residue) in 
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amino acid sequence at about 1, 2, 3, 5, 10 or more residues, from a sequence of the 
invention contained in the Sequence Listing. The differences, however, are such that: 
the C. albicans encoded polypeptide exhibits a C. albicans biological activity, e.g., 
the encoded C. albicans enzyme retains a biological activity of a naturally occurring 
5 C. albicans. 

In preferred embodiments, the encoded polypeptide includes all or a fragment 
of an amino acid sequence of the invention contained in the Sequence Listing; fused, 
in reading frame, to additional amino acid residues, preferably to residues encoded by 
genomic DNA 5' or 3' to the genomic DNA which encodes a sequence of the 

10 invention contained in the Sequence Listing. 

The C albicans strain from which the nucleotide sequences have been 
sequenced is strain SC5314, a clinical isolate which was originally obtained from a 
patient with disseminated candidiasis. 

Included in the invention are: allelic variations; natural mutants; induced 

15 mutants; proteins encoded by DNA that hybridize under high or low stringency 

conditions to a nucleic acid which encodes a polypeptide of the invention contained in 
the Sequence Listing (for definitions of high and low stringency see Current Protocols 
in Molecular Biology, John Wiley & Sons, New York, 1989, 6.3.1 - 6.3.6, hereby 
incorporated by reference); and, polypeptides specifically bound by antisera to C. 

20 albicans polypeptides, especially by antisera to an active site or binding domain of C. 
albicans polypeptide. The invention also includes fragments, preferably biologically 
active fragments. These and other polypeptides are also referred to herein as C. 
albicans polypeptide analogs or variants. 

The invention further provides nucleic acids, e.g., RNA or DNA, encoding a 

25 polypeptide of the invention. This includes double stranded nucleic acids as well as 
coding and antisense single strands. 

In preferred embodiments, the subject C. albicans nucleic acid will include a 
transcriptional regulatory sequence, e.g. at least one of a transcriptional promoter or 
transcriptional enhancer sequence, operably linked to the C. albicans gene sequence, 

30 e.g., to render the C. albicans gene sequence suitable for expression in a recombinant 
host cell. 
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In yet a further preferred embodiment, the nucleic acid which encodes an C. 
albicans polypeptide of the invention, hybridizes under stringent conditions to a 
nucleic acid probe corresponding to at least about 8 consecutive nucleotides of the 
invention contained in the Sequence Listing; more preferably to at least about 12 
5 consecutive nucleotides of the invention contained in the Sequence Listing; more 
preferably to at least about 20 consecutive nucleotides of the invention contained in 
the Sequence Listing; more preferably to at least about 40 consecutive nucleotides of 
the invention contained in the Sequence Listing. 

In another aspect, the invention provides a substantially pure nucleic acid 

10 having a nucleotide sequence which encodes an C. albicans polypeptide. In preferred 
embodiments: the encoded polypeptide has biological activity; the encoded 
polypeptide has an amino acid sequence at least about 60%, 70%, 80%, 90%, 95%, 
98%, or 99% homologous to an amino acid sequence of the invention contained in the 
Sequence Listing; the encoded polypeptide has an amino acid sequence essentially the 

15 same as an amino acid sequence of the invention contained in the Sequence Listing; 
the encoded polypeptide is at least about 5, 10, 20, 50, 100, or 150 amino acids in 
length; the encoded polypeptide comprises at least about 5, preferably at least about 
10, more preferably at least about 20, more preferably at least about 50, 100, or 150 
contiguous amino acids of the invention contained in the Sequence Listing. 

20 In another aspect, the invention encompasses: a vector including a nucleic 

acid which encodes an C. albicans polypeptide or an C. albicans polypeptide variant 
as described herein; a host cell transfected with the vector; and a method of producing 
a recombinant C. albicans polypeptide or C albicans polypeptide variant; including 
culturing the cell, e.g., in a cell culture medium, and isolating an C. albicans or C. 

25 albicans polypeptide variant, e.g., from the cell or from the cell culture medium. 

One embodiment of the invention is directed to substantially isolated nucleic 
acids. Nucleic acids of the invention include sequences comprising at least about 8 
nucleotides in length, more preferably at least about 12 nucleotides in length, even 
more preferably at least about 15-20 nucleotides in length, that correspond to a 

30 subsequence of any one of SEQ ID NO: 1 - SEQ ID NO: 14103 or complements 
thereof. Alternatively, the nucleic acids comprise sequences contained within any 
ORF (open reading frame), including a complete protein-coding sequence, of which 
any of SEQ ID NO: 1 - SEQ ID NO: 14103 forms a part. The invention encompasses 



Attorney Docket No.: PATH03-13 

-12- 

sequence-conservative variants and function-conservative variants of these sequences. 
The nucleic acids may be DNA, RNA, DNA/RNA duplexes, protein-nucleic acid 
(PNA), or derivatives thereof. 

In another aspect, the invention features, a purified recombinant nucleic acid 
5 having at least about 50%, 60%, 70%, 80%, 90%, 95%, 98%, or 99% homology with 
a sequence of the invention contained in the Sequence Listing 

The invention also encompasses recombinant DNA (including DNA cloning 
and expression vectors) comprising these C. albicans-derived sequences; host cells 
comprising such DNA, including fungal, bacterial, yeast, plant, insect, and 
10 mammalian host cells; and methods for producing expression products comprising 
RNA and polypeptides encoded by the C. albicans sequences. These methods are 
carried out by incubating a host cell comprising a C. albicans-derived nucleic acid 
sequence under conditions in which the sequence is expressed. The host cell may be 
native or recombinant. The polypeptides can be obtained by (a) harvesting the 
15 incubated cells to produce a cell fraction and a medium fraction; and (b) recovering 
the C. albicans polypeptide from the cell fraction, the medium fraction, or both. The 
polypeptides can also be made by in vitro translation. 

In another aspect, the invention features nucleic acids capable of binding 
mRNA of C. albicans. Such nucleic acid is capable of acting as antisense nucleic 
20 acid to control the translation of mRNA of C. albicans. A further aspect features a 
nucleic acid which is capable of binding specifically to a C. albicans nucleic acid. 
These nucleic acids are also referred to herein as complements and have utility as 
probes and as capture reagents. 

In another aspect, the invention features an expression system comprising an 
25 open reading frame corresponding to C. albicans nucleic acid. The nucleic acid 
further comprises a control sequence compatible with an intended host. The 
expression system is useful for making polypeptides corresponding to C. albicans 
nucleic acid. 

In another aspect, the invention encompasses: a vector including a nucleic acid 
30 which encodes a C. albicans polypeptide or a C albicans polypeptide variant as 

described herein; a host cell transfected with the vector; and a method of producing a 
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recombinant C. albicans polypeptide or C. albicans polypeptide variant; including 
culturing the cell, e.g., in a cell culture medium, and isolating the C. albicans or C. 
albicans polypeptide variant, e.g., from the cell or from the cell culture medium. 
In yet another embodiment of the invention encompasses reagents for 
5 detecting fungal infection, including C. albicans infection, which comprise at least 
one C. albicans-der'wed nucleic acid defined by any one of SEQ ID NO: 1 - SEQ ID 
NO: 14103, or sequence-conservative or function-conservative variants thereof. 
Alternatively, the diagnostic reagents comprise polypeptide sequences that are 
contained within any open reading frames (ORFs), including complete protein-coding 

10 sequences, contained within any of SEQ ID NO: 1 - SEQ ID NO: 14103, or 

polypeptide sequences contained within any of SEQ ID NO: 14104 - SEQ ID NO: 
28206, or polypeptides of which any of the above sequences forms a part, or 
antibodies directed against any of the above peptide sequences or function- 
conservative variants and/or fragments thereof. 

15 The invention further provides antibodies, preferably monoclonal antibodies, 

which specifically bind to the polypeptides of the invention. Methods are also 
provided for producing antibodies in a host animal. The methods of the invention 
comprise immunizing an animal with at least one C albicans-derived immunogenic 
component, wherein the immunogenic component comprises one or more of the 

20 polypeptides encoded by any one of SEQ ID NO: 1 - SEQ ID NO: 14103 or 

sequence-conservative or function-conservative variants thereof; or polypeptides that 
are contained within any ORFs, including complete protein-coding sequences, of 
which any of SEQ ID NO: 1 - SEQ ID NO: 14103 forms a part; or polypeptide 
sequences contained within any of SEQ ID NO: 14104 - SEQ ID NO: 28206; or 

25 polypeptides of which any of SEQ ID NO: 14104 - SEQ ID NO: 28206 forms a part. 
Host animals include any warm blooded animal, including without limitation 
mammals and birds. Such antibodies have utility as reagents for immunoassays to 
evaluate the abundance and distribution of C albicans-specific antigens. 

In yet another aspect, the invention provides diagnostic methods for detecting 

30 C. albicans antigenic components or anti-C. albicans antibodies in a sample. C. 

albicans antigenic components are detected by a process comprising: (i) contacting a 
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sample suspected to contain a fungal antigenic component with a fungal-specific 
antibody, under conditions in which a stable antigen-antibody complex can form 
between the antibody and fungal antigenic components in the sample; and (ii) 
detecting any antigen-antibody complex formed in step (i), wherein detection of an 
5 antigen-antibody complex indicates the presence of at least one fungal antigenic 
component in the sample. In different embodiments of this method, the antibodies 
used are directed against a sequence encoded by any of SEQ ID NO: 1 - SEQ ID NO: 
14103 or sequence-conservative or function-conservative variants thereof, or against a 
polypeptide sequence contained in any of SEQ ID NO: 14104 - SEQ ID NO: 28206 or 

1 0 function-conservative variants thereof. 

In yet another aspect, the invention provides a method for detecting 
antifungal-specific antibodies in a sample, which comprises: (i) contacting a sample 
suspected to contain antifungal-specific antibodies with a C albicans antigenic 
component, under conditions in which a stable antigen-antibody complex can form 

15 between the C. albicans antigenic component and antifungal antibodies in the 

sample; and (ii) detecting any antigen-antibody complex formed in step (i), wherein 
detection of an antigen-antibody complex indicates the presence of antifungal 
antibodies in the sample. In different embodiments of this method, the antigenic 
component is encoded by a sequence contained in any of SEQ ID NO: 1 - SEQ ID 

20 NO: 14103 or sequence-conservative and function-conservative variants thereof, or is 
a polypeptide sequence contained in any of SEQ ID NO: 14104 - SEQ ID NO: 28206 
or function-conservative variants thereof. 

In another aspect, the invention features a method of generating vaccines for 
immunizing an individual against C. albicans. The method includes: immunizing a 

25 subject with a C. albicans polypeptide, e.g., a surface or secreted polypeptide, or a 
combination of such peptides or active portion(s) thereof, and a pharmaceutically 
acceptable carrier. Such vaccines have therapeutic and prophylactic utilities. 

In another aspect, the invention features a method of evaluating a compound, 
e.g. a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind a 

30 C. albicans polypeptide. The method includes: contacting the candidate compound 
with a C. albicans polypeptide and determining if the compound binds or otherwise 



Attorney Docket No.: PATH03-13 

-15- 

interacts with an C. albicans polypeptide. Compounds which bind C. albicans are 
candidates as activators or inhibitors of the fiingal life cycle. These assays can be 
performed in vitro or in vivo. 

In another aspect, the invention features a method of evaluating a compound, 
5 e.g. a polypeptide, e.g., a fragment of a host cell polypeptide, for the ability to bind cm 
C. albicans nucleic acid, e.g., DNA or RNA. The method includes: contacting the 
candidate compound with a C. albicans nucleic acid and determining if the compound 
binds or otherwise interacts with a C. albicans polypeptide. Compounds which bind 
C. albicans are candidates as activators or inhibitors of the fungal life cycle. These 

1 0 assays can be performed in vitro or in vivo. 

A particularly preferred embodiment of the invention is directed to a method 
of screening test compounds for anti-fungal activity, which method comprises: 
selecting as a target a fungal specific sequence, which sequence is essential to the 
viability of a fungal species; contacting a test compound with said target sequence; 

1 5 and selecting those test compounds which bind to said target sequence as potential 
anti-fungal candidates. In one embodiment, the target sequence selected is specific to 
a single species, or even a single strain, i.e., the C. albicans strain SC5314. In a 
second embodiment, the target sequence is common to at least two species of fungi. 
In a third embodiment, the target sequence is common to a family of fungi. The target 

20 sequence may be a nucleic acid sequence or a polypeptide sequence. Methods 

employing sequences common to more than one species of microorganism may be 
used to screen candidates for broad spectrum anti-fungal activity. 

The invention also provides methods for preventing or treating disease caused 
by certain fungi, including C. albicans, which are carried out by administering to an 

25 animal in need of such treatment, in particular a warm-blooded vertebrate, including 
but not limited to birds and mammals, a compound that specifically inhibits or 
interferes with the function of a fungal polypeptide or nucleic acid. In a particularly 
preferred embodiment, the mammal to be treated is human. 



30 DETAILED DESCRIPTION OF THE INVENTION 



The sequences of the present invention include the specific nucleic acid and 
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amino acid sequences set forth in the Sequence Listing that forms a part of the present 
specification, and which are designated SEQ ID NO: 1 - SEQ ID NO: 28206. Use of 
the terms "SEQ ID NO: 1 - SEQ ID NO: 14103 ", "SEQ ID NO: 14104 - SEQ ID NO: 
28206 " the sequences depicted in Table 2", and like terms, is intended, for 
5 convenience, to refer to each individual SEQ ID NO individually, and is not intended 
to refer to the genus of these sequences unless such reference would be indicated. In 
other words, it is a shorthand for listing all of these sequences individually. The 
invention encompasses each sequence individually, as well as any combination 
thereof. 

10 

Definitions 

"Nucleic acid" or "polynucleotide" as used herein refers to purine- and 
pyrimidine-containing polymers of any length, either polyribonucleotides or 
polydeoxyribonucleotides or mixed polyribo-polydeoxyribo nucleotides. This 
1 5 includes single- and double-stranded molecules, i.e., DNA-DNA, DNA-RNA and 
RNA-RNA hybrids, as well as "protein nucleic acids" (PNA) formed by conjugating 
bases to an amino acid backbone. This also includes nucleic acids containing 
modified bases. 

A nucleic acid or polypeptide sequence that is "derived from" a designated 
20 sequence refers to a sequence that corresponds to a region of the designated sequence. 
For nucleic acid sequences, this encompasses sequences that are homologous or 
complementary to the sequence, as well as "sequence-conservative variants" and 
"function-conservative variants." For polypeptide sequences, this encompasses 
"function-conservative variants." Sequence-conservative variants are those in which 
25 a change of one or more nucleotides in a given codon position results in no alteration 
in the amino acid encoded at that position. Function-conservative variants are those 
in which a given amino acid residue in a polypeptide has been changed without 
altering the overall conformation and function of the native polypeptide, including, 
but not limited to, replacement of an amino acid with one having similar physico- 
30 chemical properties (such as, for example, acidic, basic, hydrophobic, and the like). 
"Function-conservative" variants also include any polypeptides that have the ability to 
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elicit antibodies specific to a designated polypeptide. 

An "C albicanS'derived" nucleic acid or polypeptide sequence may or may 
not be present in other fungal species, and may or may not be present in all C. 
albicans strains. This term is intended to refer to the source from which the sequence 
5 was originally isolated. Thus, a C. albicans-defwtd polypeptide, as used herein, may 
be used, e.g., as a target to screen for a broad spectrum antifungal agent, to search for 
homologous proteins in other species of fungi or in eukaryotic organisms such as 
humans, etc. 

A purified or isolated polypeptide or a substantially pure preparation of a 
10 polypeptide are used interchangeably herein and, as used herein, mean a polypeptide 
that has been separated from other proteins, lipids, and nucleic acids with which it 
naturally occurs. Preferably, the polypeptide is also separated from substances, e.g., 
antibodies or gel matrix, e.g., polyacrylamide, which are used to purify it. Preferably, 
the polypeptide constitutes at least about 10, 20, 50 70, 80 or 95% dry weight of the 
15 purified preparation. Preferably, the preparation contains sufficient polypeptide to 
allow protein sequencing, which is preferably at least about 1, 10, or 100 mg of the 
polypeptide. 

A purified preparation of cells refers to, in the case of plant or animal cells, an 
in vitro preparation of cells and not an entire intact plant or animal. In the case of 

20 cultured cells or microbial cells, it consists of a preparation of at least about 10% and 
more preferably at least about 50% of the subject cells. 

A purified or isolated or a substantially pure nucleic acid, e.g., a substantially 
pure DNA, (are terms used interchangeably herein) is a nucleic acid which is one or 
both of the following: not immediately contiguous with both of the coding sequences 

25 with which it is immediately contiguous (i.e., one at the 5' end and one at the 3' end) 
in the naturally-occurring genome of the organism from which the nucleic acid is 
derived; or which is substantially free of a nucleic acid with which it occurs in the 
organism from which the nucleic acid is derived. The term includes, for example, a 
recombinant DNA which is incorporated into a vector, e.g., into an autonomously 

30 replicating plasmid or virus, or into the genomic DNA of a prokaryote or eukaryote, 
or which exists as a separate molecule (e.g., a cDNA or a genomic DNA fragment 
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produced by PCR or restriction endonuclease treatment) independent of other DNA 
sequences. Substantially pure DNA also includes a recombinant DNA which is part 
of a hybrid gene encoding additional C. albicans DNA sequence. 

A "contig" as used herein is a nucleic acid representing a continuous stretch of 
5 genomic sequence of an organism. 

An "open reading frame", also referred to herein as ORF, is a region of nucleic 
acid which encodes a polypeptide. This region may represent a portion of a coding 
sequence or a total sequence and can be determined from a stop to stop codon or from 
a start to stop codon. 

10 As used herein, a "coding sequence" is a nucleic acid which is transcribed into 

messenger RNA and/or translated into a polypeptide when placed under the control of 
appropriate regulatory sequences. The boundaries of the coding sequence are 
determined by a translation start codon at the five prime terminus and a translation 
stop code at the three prime terminus. A coding sequence can include but is not 

15 limited to messenger RNA, synthetic DNA, and recombinant nucleic acid sequences. 

A "complement" of a nucleic acid as used herein refers to an anti-parallel or 
antisense sequence that participates in Watson-Crick base-pairing with the original 
sequence. 

A "gene product" is a protein or structural RNA which is specifically encoded 
20 by a gene. 

As used herein, the term "probe" refers to a nucleic acid, peptide or other 
chemical entity which specifically binds to a molecule of interest. Probes are often 
associated with or capable of associating with a label. A label is a chemical moiety 
capable of detection. Typical labels comprise dyes, radioisotopes, luminescent and 

25 chemiluminescent moieties, fluorophores, enzymes, precipitating agents, 

amplification sequences, and the like. Similarly, a nucleic acid, peptide or other 
chemical entity which specifically binds to a molecule of interest and immobilizes 
such molecule is referred herein as a "capture ligand". Capture ligands are typically 
associated with or capable of associating with a support such as nitro-cellulose, glass, 

30 nylon membranes, beads, particles and the like. The specificity of hybridization is 
dependent on conditions such as the base pair composition of the nucleotides, and the 
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temperature and salt concentration of the reaction. These conditions are readily 
discernable to one of ordinary skill in the art using routine experimentation. 

"Homologous" refers to the sequence similarity or sequence identity between 
two polypeptides or between two nucleic acid molecules. When a position in both of 
5 the two compared sequences is occupied by the same base or amino acid monomer 
subunit, e.g., if a position in each of two DNA molecules is occupied by adenine, then 
the molecules are homologous at that position. The percent of homology between two 
sequences is a function of the number of matching or homologous positions shared by 
the two sequences divided by the number of positions compared x 100. For example, 

10 if 6 of 10 of the positions in two sequences are matched or homologous then the two 
sequences are 60% homologous. By way of example, the DNA sequences ATTGCC 
and TATGGC share 50% homology. Generally, a comparison is made when two 
sequences are aligned to give maximum homology. 

Nucleic acids are hybridizable to each other when at least one strand of a 

15 nucleic acid can anneal to the other nucleic acid under defined stringency conditions. 
Stringency of hybridization is determined by: (a) the temperature at which 
hybridization and/or washing is performed; and (b) the ionic strength and polarity of 
the hybridization and washing solutions. Hybridization requires that the two nucleic 
acids contain complementary sequences; depending on the stringency of 

20 hybridization, however, mismatches may be tolerated. Typically, hybridization of 
two sequences at high stringency (such as, for example, in a solution of 0.5X SSC, at 
65° C) requires that the sequences be essentially completely homologous. Conditions 
of intermediate stringency (such as, for example, 2X SSC at 65 ° C) and low 
stringency (such as, for example 2X SSC at 55° C), require correspondingly less 

25 overall complementarity between the hybridizing sequences. (IX SSC is 0.15 M 
NaCl, 0.015 MNa citrate). 

The terms peptides, proteins, and polypeptides are used interchangeably 

herein. 

As used herein, the term "surface protein" refers to all surface accessible 
30 proteins, e.g. inner and outer membrane proteins, proteins adhering to the cell wall, 
and secreted proteins. 
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A polypeptide has C. albicans biological activity if it has one, two and 
preferably more of the following properties: (1) if when expressed in the course of a 
C. albicans infection, it can promote, or mediate the attachment of C. albicans to a 
cell; (2) it has an enzymatic activity, structural or regulatory function characteristic of 
5 a C. albicans protein; (3) or the gene which encodes it can rescue a lethal mutation in 
a C. albicans gene. A polypeptide has biological activity if it is an antagonist, 
agonist, or super-agonist of a polypeptide having one of the above-listed properties. 

A biologically active fragment or analog is one having an in vivo or in vitro 
activity which is characteristic of the C. albicans polypeptides of the invention 

10 contained in the Sequence Listing, or of other naturally occurring C. albicans 
polypeptides, e.g., one or more of the biological activities described herein. 
Especially preferred are fragments which exist in vivo, e.g., fragments which arise 
from post transcriptional processing or which arise from translation of alternatively 
spliced RNA's. Fragments include those expressed in native or endogenous cells as 

15 well as those made in expression systems, e.g., in CHO (Chinese Hamster Ovary) 
cells. Because peptides such as C. albicans polypeptides often exhibit a range of 
physiological properties and because such properties may be attributable to different 
portions of the molecule, a useful C. albicans fragment or C. albicans analog is one 
which exhibits a biological activity in any biological assay for C. albicans activity. 

20 Most preferably the fragment or analog possesses 1 0%, preferably 40%, more 

preferably 60%, 70%>, 80% or 90% or greater of the activity of C. albicans, in any in 
vivo or in vitro assay. 

Analogs can differ from naturally occurring C. albicans polypeptides in amino 
acid sequence or in ways that do not involve sequence, or both. Non-sequence 

25 modifications include changes in acetylation, methylation, phosphorylation, 

carboxylation, or glycosylation. Preferred analogs include G albicans polypeptides 
(or biologically active fragments thereof) whose sequences differ from the wild-type 
sequence by one or more conservative amino acid substitutions or by one or more 
non-conservative amino acid substitutions, deletions, or insertions which do not 

30 substantially diminish the biological activity of the C. albicans polypeptide. 

Conservative substitutions typically include the substitution of one amino acid for 
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another with similar characteristics, e.g., substitutions within the following groups: 
valine, glycine; glycine, alanine; valine, isoleucine, leucine; aspartic acid, glutamic 
acid; asparagine, glutamine; serine, threonine; lysine, arginine; and phenylalanine, 
tyrosine. Other conservative substitutions can be made in view of the table below. 



5 TABLE 1 

CONSERVATIVE AMINO ACID REPLACEMENTS 



For Amino Acid 


Code 


Replace with any of 


/Maiiine 


A 
A 


Pi Alo d]ir Koto A In I f^\r^ Pt 

L>-Aia, uiy, neta-Aia, L-cys, u-cys 


Argmine 


rv 


D-Arg, Lys, D-Lys, homo-Arg, D-homo-Arg, Met, He, D-Met, 
D-Ile, Orn, D-Orn 


/\5>pdrdginc 


IN 


A en A on Pl A en f^ln Pl C\ 1 1 1 (~l 1 r\ Pl /^1n 

u-/\sn ; Asp, L'-Asp, oiu, u-oiu, oin, u-vjin 


AbpdrllC AL1Q 


Lf 


P\ A cn Pl A on Acn /"lln Pl /~lln Pi /~11n 

u-Asp, u-Asn, Asn, uiu, u-uiu, oin, u-oin 


Cysteine 


n 


Pl r^\rc C \Atx Pur A/fzaf Pl N 4 ~f TU- P\ TU — 

u-t^ys, o-ivie-cys, Met, D-Met, 1 nr, u- 1 nr 


IV 1 1 ito m i rip 
VJl ULalllinc 


o 

V 


Pl n\ r\ Acn Pi Aon Pi /^ln A Pi Acn 

U-vjjrl, ASU, LV-ASn, oiu, li-oiu, Asp, u-Asp 


i» 1 1 ltd m t r* A r»i/H 


p 


Pl r^l 1 1 Pl Acn Aon Acn P\ Acn filn Pl 
Ly-OIU, U-ASp, ASp, ASD, U- ASH, VJin, JJ-VJin 






Ala Pl Ala Prn Pl Prr» R Aid Arn 

/Aid, L/-/\ la, rro, u-v ro, p-/\ia, Acp 


Isoleucine 


I 


D-Ile, Val, D-Val, Leu, D-Leu, Met, D-Met 


Leucine 


L 


D-Leu, Val, D-Val, Leu, D-Leu, Met, D-Met 


Lysine 


K 


D-Lys, Arg, D-Arg, homo-Arg, D-homo-Arg, Met, D-Met, He, 
D-IIe, Orn, D-Orn 


Methionine 


M 


D-Met, S-Me-Cys, He, D-Ile, Leu, D-Leu, Val, D-Val 


Phenylalanine 


F 


D-Phe, Tyr, D-Thr, L-Dopa, His, D-His, Trp, D-Trp, Trans-3,4, 
or 5-phenyIproline, cis-3,4, or 5-phenylproline 


Proline 


P 


D-Pro, L-I-thioazoIidine-4-carboxyIic acid, D-or L-l- 
oxazolidine-4-carboxylic acid 


Serine 


S 


D-Ser, Thr, D-Thr, allo-Thr, Met, D-Met, Met(O), 

D-Met(O), L-Cys, D-Cys ! 


Threonine 


T 


D-Thr, Ser, D-Ser, allo-Thr, Met, D-Met, Met(O), 
D-Met(O), Val, D-Val 


Tyrosine 


Y 


D-Tyr, Phe, D-Phe, L-Dopa, His, D-His 


Valine 


V 


D-Val, Leu, D-Leu, He, D-Ile, Met, D-Met 



Other analogs within the invention are those with modifications which 
increase peptide stability; such analogs may contain, for example, one or more non 
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peptide bonds (which replace the peptide bonds) in the peptide sequence. Also 
included are: analogs that include residues other than naturally occurring L-amino 
acids, e.g., D-amino acids or non-naturally occurring or synthetic amino acids, e.g., p 
or y amino acids; and cyclic analogs. 
5 As used herein, the term "fragment", as applied to a C. albicans analog, will 

ordinarily be at least about 20 residues, more typically at least about 40 residues, 
preferably at least about 60 residues in length. Fragments of C albicans polypeptides 
can be generated by methods known to those skilled in the art. The ability of a 
candidate fragment to exhibit a biological activity of C. albicans polypeptide can be 

10 assessed by methods known to those skilled in the art as described herein. Also 
included are C. albicans polypeptides containing residues that are not required for 
biological activity of the peptide or that result from alternative mRNA splicing or 
alternative protein processing events. 

An "immunogenic component" as used herein is a moiety, such as a C. 

15 albicans polypeptide, analog or fragment thereof, that is capable of eliciting a 
humoral and/or cellular immune response in a host animal. 

An "antigenic component" as used herein is a moiety, such as a C albicans 
polypeptide, analog or fragment thereof, that is capable of binding to a specific 
antibody with sufficiently high affinity to form a detectable antigen-antibody 

20 complex. 

The term "antibody" as used herein is intended to include fragments thereof 
which are specifically reactive with C. albicans polypeptides. 

As used herein, the term "cell-specific promoter" means a DNA sequence that 
serves as a promoter, i.e., regulates expression of a selected DNA sequence operably 
25 linked to the promoter, and which effects expression of the selected DNA sequence in 
specific cells of a tissue. The term also covers so-called "leaky" promoters, which 
regulate expression of a selected DNA primarily in one tissue, but cause expression in 
other tissues as well. 

Misexpression, as used herein, refers to a non-wild type pattern of gene 
30 expression. It includes: expression at non-wild type levels, i.e., over or under 

expression; a pattern of expression that differs from wild type in terms of the time or 
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stage at which the gene is expressed, e.g., increased or decreased expression (as 
compared with wild type) at a predetermined developmental period or stage; a pattern 
of expression that differs from wild type in terms of increased expression (as 
compared with wild type) in a predetermined cell type or tissue type; a pattern of 
5 expression that differs from wild type in terms of the splicing size, amino acid 
sequence, post-trans lational modification, or biological activity of the expressed 
polypeptide; a pattern of expression that differs from wild type in terms of the effect 
of an environmental stimulus or extracellular stimulus on expression of the gene, e.g., 
a pattern of increased or decreased expression (as compared with wild type) in the 

10 presence of an increase or decrease in the strength of the stimulus. 

As used herein, "host cells" and other such terms denoting microorganisms or 
higher eukaryotic cell lines cultured as unicellular entities refers to cells which can 
become or have been used as recipients for a recombinant vector or other transfer 
DNA, and include the progeny of the original cell which has been transfected. It is 

1 5 understood by individuals skilled in the art that the progeny of a single parental cell 
may not necessarily be completely identical in genomic or total DNA compliment to 
the original parent, due to accident or deliberate mutation. 

As used herein, the term "control sequence" refers to a nucleic acid having a 
base sequence which is recognized by the host organism to effect the expression of 

20 encoded sequences to which they are ligated. The nature of such control sequences 
differs depending upon the host organism; in prokaryotes, such control sequences 
generally include a promoter, ribosomal binding site, terminators, and in some cases 
operators; in eukaryotes, generally such control sequences include promoters, 
terminators and in some instances, enhancers. The term control sequence is intended 

25 to include at a minimum, all components whose presence is necessary for expression, 
and may also include additional components whose presence is advantageous, for 
example, leader sequences. 

As used herein, the term "operably linked" refers to sequences joined or 
ligated to function in their intended manner. For example, a control sequence is 

30 operably linked to coding sequence by ligation in such a way that expression of the 
coding sequence is achieved under conditions compatible with the control sequence 
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and host cell. 

The "metabolism 55 of a substance, as used herein, means any aspect of the 
expression, function, action, or regulation of the substance. The metabolism of a 
substance includes modifications, e.g., covalent or non-covalent modifications of the 
5 substance. The metabolism of a substance includes modifications, e.g., covalent or 
non-covalent modification, the substance induces in other substances. The 
metabolism of a substance also includes changes in the distribution of the substance. 
The metabolism of a substance includes changes the substance induces in the 
distribution of other substances. 

10 A "sample" as used herein refers to a biological sample, such as, for example, 

tissue or fluid isloated from an individual (including without limitation plasma, 
serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro 
cell culture constituents, as well as samples from the environment. 

Technical and scientific terms used herein have the meanings commonly 

15 understood by one of ordinary skill in the art to which the present invention pertains, 
unless otherwise defined. Reference is made herein to various methodologies known 
to those of skill in the art. Publications and other materials setting forth such known 
methodologies to which reference is made are incorporated herein by reference in 
their entireties as though set forth in full. The practice of the invention will employ, 

20 unless otherwise indicated, conventional techniques of chemistry, molecular biology, 
microbiology, recombinant DNA, and immunology, which are within the skill of the 
art. Such techniques are explained frilly in the literature. See e.g., Sambrook, Fritsch, 
and Maniatis, Molecular Cloning; Laboratory Manual 2nd ed. (1989); DNA Cloning, 
Volumes I and II (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 

25 1 984); Nucleic Acid Hybridization (B.D. Hames & S.J. Higgins eds. 1 984); the series, 
Methods in Enzymoloqy (Academic Press, Inc.), particularly Vol. 154 and Vol. 155 
(Wu and Grossman, eds.); PCR-A Practical Approach (McPherson, Quirke, and 
Taylor, eds., 1991); Immunology, 2d Edition, 1989, Roitt et al, C.V. Mosby 
Company, and New York; Advanced Immunology, 2d Edition, 1991 , Male et al, 

30 Grower Medical Publishing, New York.; DNA Cloning: A Practical Approach, 

Volumes I and II, 1985 (D.N. Glover ed.); Oligonucleotide Synthesis, 1984, (M.L. 
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Gait ed); Transcription and Translation, 1984 (Hames and Higgins eds.); Animal Cell 
Culture, 1986 (R.L Freshney ed.); Immobilized Cells and Enzymes, 1986 (IRL Press); 
Perbal, 1 984, A Practical Guide to Molecular Cloning', Gene Transfer Vectors for 
Mammalian Cells, 1987 (J. H. Miller and M. P. Calos eds., Cold Spring Harbor 
5 Laboratory); Martin J. Bishop, ed., Guide to Human Genome Computing, 2d Edition, 
Academic Press, San Diego, CA. (1998); and Leonard F. Peruski, Jr., and Anne 
Harwood Peruski, The Internet and the New Biology: Tools for Genomic and 
Molecular Research, American Society for Microbiology, Washington, D.C. (1997). 
Any suitable materials and/or methods known to those of skill can be utilized 
1 0 in carrying out the present invention: however preferred materials and/or methods are 
described. Materials, reagents and the like to which reference is made in the 
following description and examples are obtainable from commercial sources, unless 
otherwise noted. 

15 C. albicans Genomic Sequence 

This invention provides nucleotide sequences of the genome of C. albicans, 
strain SC5314, which thus comprises a DNA sequence library of G albicans genomic 
DNA. The detailed description that follows provides nucleotide sequences of C. 
albicans, and also describes how the sequences were obtained and how ORFs and 

20 protein-coding sequences can be identified. Also described are methods of using the 
disclosed C albicans sequences in methods including diagnostic and therapeutic 
applications. Furthermore, the library can be used as a database for identification and 
comparison of medically important sequences in this and other strains of C albicans. 
To determine the genomic sequence of C albicans, DNA from strain SC5314 

25 of C albicans was isolated after Zymolyase digestion, sodium dodecyl sulfate lysis, 
potassium acetate precipitation, phenohchloroform extractionand ethanol precipitation 
(Soli, D.R., T. Srikantha and S.R. Lockhart: Characterizing Developmentally 
Regulated Genes in C albicans, In Microbial Genome Methods, K.W. Adolph, editor. 
CRC Press. New York, p 17-37.). DNA was sheared hydrodynamically using an 

30 HPLC (Oefner, et. al., 1996) to an insert size of 2000-3000 bp. After size 

fractionation by gel electrophoresis the fragments were blunt-ended, ligated to adapter 
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oligonucleotides and cloned into the pGTC (Thomann) vector to construct a "shotgun" 
subclone library. 

DNA sequencing was achieved using established ABI sequencing methods on 
ABI377 automated DNA sequencers. The cloning and sequencing procedures are 
5 described in more detail in the Exemplification. 

Individual sequence reads were assembled using PHRAP (P. Green, Abstracts 
of DOE Human Genome Program Contractor-Grantee Workshop V, Jan, 1996, 
p. 157). The average contig length was about 3-4 kb. 

All subsequent steps were based on sequencing by ABI377 automated DNA 
10 sequencing methods. The cloning and sequencing procedures are described in more 
detail in the Exemplification. 

A variety of approaches is used to order the contigs so as to obtain a 
continuous sequence representing the entire C. albicans genome. Synthetic 
oligonucleotides are designed that are complementary to sequences at the end of each 
15 contig. These oligonucleotides may be hybridized to libaries of C. albicans genomic 
DNA in, for example, lambda phage vectors or plasmid vectors to identify clones that 
contain sequences corresponding to the junctional regions between individual contigs. 
Such clones are then used to isolate template DNA and the same oligonucleotides are 
used as primers in polymerase chain reaction (PCR) to amplify junctional fragments, 
20 the nucleotide sequence of which is then determined. 

The C. albicans sequences were analyzed for the presence of open reading 
frames (ORFs) comprising at least about 180 nucleotides. As a result of the analysis 
of ORFs based on stop-to-stop codon reads, it should be understood that these ORFs 
may not correspond to the ORF of a naturally-occurring C. albicans polypeptide. 
25 These ORFs may contain start codons which indicate the initiation of protein 

synthesis of a naturally-occurring C. albicans polypeptide. Such start codons within 
the ORFs provided herein can be identified by those of ordinary skill in the relevant 
art, and the resulting ORF and the encoded C. albicans polypeptide is within the 
scope of this invention. For example, within the ORFs a codon such as AUG or GUG 
30 (encoding methionine or valine) which is part of the initiation signal for protein 

synthesis can be identified and the portion of an ORF to corresponding to a naturally- 



Attorney Docket No.: PATH03-13 

-27- 

occurring C. albicans polypeptide can be recognized. The predicted coding regions 
were defined by evaluating the coding potential of such sequences with the program 
GENEMARKD (Borodovsky and Mclninch, 1993, Comp. . 17:123). 

Each predicted ORF amino acid sequence was compared with all sequences 
5 found in current GENBANK, SWISS-PROT, and PIR databases using the BLAST 
algorithm. BLAST identifies local alignments occurring by chance between the ORF 
sequence and the sequence in the databank (Altschal et al., 1990, L Mol. Biol. 
215:403-410). Homologous ORFs (probabilities less than 10" 5 by chance) andORF's 
that are probably non-homologous (probabilities greater than 10" 5 by chance) but have 

10 good codon usage were identified. Both homologous, sequences and non- 
homologous sequences with good codon usage, are likely to encode proteins and are 
encompassed by the invention. 

It is to be understood that non-protein-coding sequences contained within SEQ 
ID NO: 1 - SEQ ID NO: 14103 are also within the scope of the invention. Such 

15 sequences include, without limitation, sequences important for replication, 
recombination, transcription and translation. Non-limiting examples include 
promoters and regulatory binding sites involved in regulation of gene expression, and 
5 - and 3'- untranslated sequences (e.g., ribosome-binding sites) that form part of 
mRNA molecules. 

20 Preferred sequences are those that are useful in diagnostic and/or therapeutic 

applications. Diagnostic applications include without limitation nucleic-acid-based 
and antibody-based methods for detecting C. albicans infection. Therapeutic 
applications include without limitation vaccines, passive immunotherapy, and drug 
treatments directed against gene products that are essential for growth and/or 

0 

25 replication. In a particularly preferred aspect of the invention, the nucleic acids 

encode protein-coding sequences which share homology to other fungal sequences, 
lack homology to all eukaryotic sequences, and which are essential to the viability of 
fungi. Such sequences comprise a library of valuable target sequences for drug 
discovery, in particular, targets which may be used to identify broad spectrum 

30 antifungal agents. 
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C. albicans Nucleic Acids 

The present invention provides a library of C. a/A/ams-derived nucleic acid 
sequences. The libraries provide probes, primers, and markers which can be used as 
markers in epidemiological studies. The present invention also provides a library of C 
5 a/6/c<ms-derived nucleic acid sequences which comprise or encode targets for 
therapeutic drugs. 

The nucleic acids of this invention are obtained directly from the DNA of the 
above referenced C albicans strain by using the polymerase chain reaction (PCR). 
See "PCR, A Practical Approach" (McPherson, Quirke, and Taylor, eds., IRL Press, 

10 Oxford, UK, 1991) for details about the PCR. High fidelity PCR can be used to 
ensure a faithful DNA copy prior to expression. In addition, the authenticity of 
amplified products can be verified by conventional sequencing methods. Clones 
carrying the desired sequences described in this invention may also be obtained by 
screening the libraries by means of the PCR or by hybridization of synthetic 

15 oligonucleotide probes to filter lifts of the library colonies or plaques as known in the 
art (see, e.g., Sambrook et al., Molecular Cloning, A Laboratory Manual 2nd edition, 
1989, Cold Spring Harbor Press, NY). 

It is also possible to obtain nucleic acids encoding C albicans polypeptides 
from a cDNA library in accordance with protocols herein described. A cDNA 

20 encoding a C. albicans polypeptide can be obtained by isolating total mRNA from an 
appropriate strain. Double stranded cDNAs can then be prepared from the total 
mRNA. Subsequently, the cDNAs can be inserted into a suitable plasmid or viral 
(e.g., bacteriophage) vector using any one of a number of known techniques. Genes 
encoding C albicans polypeptides can also be cloned using established polymerase 

25 chain reaction techniques in accordance with the nucleotide sequence information 
provided by the invention. The nucleic acids of the invention can be DNA or RNA. 
Preferred nucleic acids of the invention are contained in the Sequence Listing. 

The nucleic acids of the invention can also be chemically synthesized using 
standard techniques. Various methods of chemically synthesizing 

30 polydeoxynucleotides are known, including solid-phase synthesis which, like peptide 
synthesis, has been fully automated in commercially available DNA synthesizers (See 
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e.g., Itakura et ai. U.S. Patent No. 4,598,049; Caruthers et al U.S. Patent No. 
4,458,066; and Itakura U.S. Patent Nos. 4,401,796 and 4,373,071, incorporated by 
reference herein). 

In another example, DNA can be chemically synthesized using, e.g., the 
5 phosphoramidite solid support method of Matteucci et ah, 1981, 7. Am, Chem. Soc. 
103:3185, the method ofYooetaL, 1989, J. Biol Chem. 764:17078, or other well 
known methods. This can be done by sequentially linking a series of oligonucleotide 
cassettes comprising pairs of synthetic oligonucleotides, as described below. 

Nucleic acids isolated or synthesized in accordance with features of the 
10 present invention are useful, by way of example, without limitation, as probes, 

primers, capture ligands, antisense genes and for developing expression systems for 
the synthesis of proteins and peptides corresponding to such sequences. As probes, 
primers, capture ligands and antisense agents, the nucleic acid normally consists of all 
or part (approximately twenty or more nucleotides for specificity as well as the ability 
15 to form stable hybridization products) of the nucleic acids of the invention contained 
in the Sequence Listing. These uses are described in further detail below. 
Probes 

A nucleic acid isolated or synthesized in accordance with the sequence of the 
invention contained in the Sequence Listing can be used as a probe to specifically 

20 detect C. albicans. With the sequence information set forth in the present application, 
sequences of about twenty or more nucleotides are identified which provide the 
desired inclusivity and exclusivity with respect to C. albicans and extraneous nucleic 
acids likely to be encountered during hybridization conditions. More preferably, the 
sequence will comprise at least about twenty to thirty nucleotides to convey stability 

25 to the hybridization product formed between the probe and the intended target 
molecules. 

Sequences larger than about 1000 nucleotides in length are difficult to 
synthesize but can be generated by recombinant DNA techniques. Individuals skilled 
in the art will readily recognize that the nucleic acids, for use as probes, can be 
30 provided with a label to facilitate detection of a hybridization product. 

Nucleic acid isolated and synthesized in accordance with the sequence of the 



Attorney Docket No.: PATH03-13 

-30- 

invention contained in the Sequence Listing can also be useful as probes to detect 
homologous regions (especially homologous genes) of other Candida species using 
appropriate stringency hybridization conditions as described herein. 
Capture Ligand 

5 For use as a capture ligand, the nucleic acid selected in the manner described 

above with respect to probes, can be readily associated with a support. The manner in 
which nucleic acid is associated with supports is well known. Nucleic acid having 
twenty or more nucleotides in a sequence of the invention contained in the Sequence 
Listing have utility to separate C. albicans nucleic acid from one strain from the 

10 nucleic acid of other another strain as well as from other organisms. Nucleic acid 
having twenty or more nucleotides in a sequence of the invention contained in the 
Sequence Listing can also have utility to separate other Candida species from each 
other and from other organisms. Preferably, the sequence will comprise at least about 
twenty nucleotides to convey stability to the hybridization product formed between 

1 5 the probe and the intended target molecules. Sequences larger than 1 000 nucleotides 
in length are difficult to synthesize but can be generated by recombinant DNA 
techniques. 

Primers 

Nucleic acid isolated or synthesized in accordance with the sequences 
20 described herein have utility as primers for the amplification of C. albicans nucleic 
acid. These nucleic acids may also have utility as primers for the amplification of 
nucleic acids in other Candida species. With respect to polymerase chain reaction 
(PCR) techniques, nucleic acid sequences of > about 10-15 nucleotides of the 
invention contained in the Sequence Listing have utility in conjunction with suitable 
25 enzymes and reagents to create copies of C. albicans nucleic acid. More preferably, 
the sequence will comprise at least about twenty or more nucleotides to convey 
stability to the hybridization product formed between the primer and the intended 
target molecules. Binding conditions of primers greater than about 100 nucleotides 
are often more difficult to control to obtain specificity. High fidelity PCR can be used 
30 to ensure a faithful DNA copy prior to expression. In addition, amplified products 
can be checked by conventional sequencing methods. 
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The copies can be used in diagnostic assays to detect specific sequences, 
including genes from C albicans and/or other Candida species. The copies can also 
be incorporated into cloning and expression vectors to generate polypeptides 
corresponding to the nucleic acid synthesized by PCR, as is described in greater detail 
5 herein. 

The nucleic acids of the present invention find use as templates for the 
recombinant production of C. albicans-derived peptides or polypeptides. 
Antisense 

Nucleic acid or nucleic acid-hybridizing derivatives isolated or synthesized in 

10 accordance with the sequences described herein have utility as antisense agents to 
prevent the expression of C albicans genes. These sequences also have utility as 
antisense agents to prevent expression of genes of other Candida species. 

In one embodiment, nucleic acid or derivatives corresponding to C albicans 
nucleic acids is loaded into a suitable carrier such as a liposome or bacteriophage for 

15 introduction into fungal cells. For example, a nucleic acid having about twenty or 
more nucleotides is capable of binding to bacteria nucleic acid or bacteria messenger 
RNA. Preferably, the antisense nucleic acid is comprised of at least about 20 or more 
nucleotides to provide necessary stability of a hybridization product of non-naturally 
occurring nucleic acid and fungal nucleic acid and/or fungal messenger RNA. 

20 Nucleic acid having a sequence greater than about 1000 nucleotides in length is 
difficult to synthesize but can be generated by recombinant DNA techniques. 
Methods for loading antisense nucleic acid in liposomes are known in the art as 
exemplified, for example, in U.S. Patent 4,241,046 issued December 23, 1980 to 
Papahadjopoulos et al. 

25 The present invention encompasses isolated polypeptides and nucleic acids 

derived from C. albicans that are useful as reagents for diagnosis of fungal infection, 
components of effective anti-fungal vaccines, and/or as targets for anti-fungal drugs, 
including anti-C. albicans drugs. 



30 
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Expression of C. albicans Nucleic Acids 

Table 2, which is appended herewith and which forms part of the present 
specification, provides a list of open reading frames (ORFs) in both strands and a 
putative identification of the particular function of a polypeptide which is encoded by 
5 each ORF, based on the homology match (determined by the BLAST algorithm) of 
the predicted polypeptide with known proteins encoded by ORFs in other organisms. 
An ORF is a region of nucleic acid which encodes a polypeptide. This region may 
represent a portion of a coding sequence or a total sequence and was determined from 
stop to stop codons. The first column contains a designation for the contig from 

10 which each ORF was identified (numbered arbitrarily). Each contig represents a 

continuous stretch of the genomic sequence of the organism. The second column lists 
the ORF designation. The third and fourth columns list the SEQ ID numbers for the 
nucleic acid and amino acid sequences corresponding to each ORF, respectively. The 
fifth and sixth columns list the length of the nucleic acid ORF and the length of the 

15 amino acid ORF, respectively. The nucleotide sequence corresponding to each ORF 
begins at the first nucleotide immediately following a stop codon and ends at the 
nucleotide immediately preceding the next downstream stop codon in the same 
reading frame. It will be recognized by one skilled in the art that the natural 
translation initiation sites will correspond to ATG, GTG, or TTG codons located 

20 within the ORFs. The natural initiation sites depend not only on the sequence of a 
start codon but also on the context of the DNA sequence adjacent to the start codon. 
Usually, a recognizable ribosome binding site is found within 20 nucleotides upstream 
from the initiation codon. In some cases where genes are translationally coupled and 
coordinately expressed together in "operons," ribosome binding sites are not present, 

25 but the initiation codon of a downstream gene may occur very close to, or overlap, the 
stop codon of the an upstream gene in the same operon. The correct start codons can 
be generally identified without undue experimentation because only a few codons 
need be tested. It is recognized that the translational machinery in bacteria initiates all 
polypeptide chains with the amino acid methionine, regardless of the sequence of the 

30 start codon. In some cases, polypeptides are post-translationally modified, resulting 
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in an N-terminal amino acid other than methionine in vivo. The seventh column 
provides, where available, either a public database accession number or our own 
sequence name. The eighth and ninth columns provide metrics for assessing the 
likelihood of the homology match (determined by the BLASTP2 algorithm), as is 
5 known in the art, to the genes indicated in the eleventh column when the designated 
ORF was compared against a non-redundant comprehensive protein database. 
Specifically, the eighth column represents the "Blast Score" for the match (a higher 
score is a better match), and the ninth column represents the "P-value" for the match 
(the probability that such a match can have occurred by chance; the lower the value, 

10 the more likely the match is valid). If a BLASTP2 score of less than 46 was obtained, 
no value is reported in the table the "P-value." Column ten provides the name of the 
organism that was identified as having the closest homology match. The eleventh 
column provides, where available, the Swissprot accession number (SP),(SP), the 
locus name (LN), the Organism (OR), Source of variant (SR), E.C. number (EC), the 

1 5 gene name (GN), the product name (PN), the Function Description (FN), Left End 
(LE), Right End (RE), Coding Direction (DI), and the description (DE) or notes (NT) 
for each ORF. Information that is not preceded by a code designation in the eleventh 
column represents a description of the ORF. This information allows one of ordinary 
skill in the art to determine a potential use for each identified coding sequence and, as 

20 a result, allows to use the polypeptides of the present invention for commercial and 
industrial purposes. 

Using the information provided in SEQ ID NO: 1 - SEQ ID NO: 14103, SEQ 
ID NO: 14104 - SEQ ID NO: 28206 and in Table 2 together with routine cloning and 
sequencing methods, one of ordinary skill in the art will be able to clone and sequence 
25 all the nucleic acid fragments of interest including open reading frames (ORFs) 
encoding a large variety proteins of C albicans. 

Nucleic acid isolated or synthesized in accordance with the sequences 
described herein have utility to generate polypeptides. The nucleic acid of the 
invention exemplified in SEQ ID NO: 1 - SEQ ID NO: 14103 and in Table 2 or 
30 fragments of said nucleic acid encoding active portions of C. albicans polypeptides 
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can be cloned into suitable vectors or used to isolate nucleic acid. The isolated 
nucleic acid is combined with suitable DNA linkers and cloned into a suitable vector. 

The function of a specific gene or operon can be ascertained by expression in a 
fungal strain under conditions where the activity of the gene product(s) specified by 
5 the gene or operon in question can be specifically measured. Alternatively, a gene 
product may be produced in large quantities in an expressing strain for use as an 
antigen, an industrial reagent, for structural studies, etc. This expression can be 
accomplished in a mutant strain which lacks the activity of the gene to be tested, or in 
a strain that does not produce the same gene product(s). This includes, but is not 

10 limited to, Eucaryotic species such as the yeast Saccharomyces cerevisiae or Candida 
putida, Methanobacterium strains or other Archaea, and Eubacteria such as E. coli, B. 
Subtilis, S. Aureus, S. Pneumonia or Pseudomonas putida. In some cases the 
expression host will utilize the natural C. albicans promoter whereas in others, it will 
be necessary to drive the gene with a promoter sequence derived from the expressing 

15 organism (e.g., an E. coli beta-galactosidase promoter for expression in E. coli). 

To express a gene product using the natural C. albicans promoter, a procedure 
such as the following can be used. A restriction fragment containing the gene of 
interest, together with its associated natural promoter element and regulatory 
sequences (identified using the DNA sequence data) is cloned into an appropriate 

20 recombinant plasmid containing an origin of replication that functions in the host 
organism and an appropriate selectable marker. This can be accomplished by a 
number of procedures known to those skilled in the art. It is most preferably done by 
cutting the plasmid and the fragment to be cloned with the same restriction enzyme to 
produce compatible ends that can be ligated to join the two pieces together. The 

25 recombinant plasmid is introduced into the host organism by, for example, 

electroporation and cells containing the recombinant plasmid are identified by 
selection for the marker on the plasmid. Expression of the desired gene product is 
detected using an assay specific for that gene product. 

In the case of a gene that requires a different promoter, the body of the gene 

30 (coding sequence) is specifically excised and cloned into an appropriate expression 
plasmid. This subcloning can be done by several methods, but is most easily 
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accomplished by PCR amplification of a specific fragment and ligation into an 
expression plasmid after treating the PCR product with a restriction enzyme or 
exonuclease to create suitable ends for cloning. 

A suitable host cell for expression of a gene can be any procaryotic or 
5 eucaryotic cell. Suitable methods for transforming host cells can be found in 

Sambrook et al. ( Molecular Cloning: A Laboratory Manual . 2nd Edition, Cold Spring 
Harbor Laboratory press (1989)), and other laboratory textbooks. 

For example, a host cell transfected with a nucleic acid vector directing 
expression of a nucleotide sequence encoding a C. albicans polypeptide can be 

10 cultured under appropriate conditions to allow expression of the polypeptide to occur. 
Suitable media for cell culture are well known in the art. Polypeptides of the 
invention can be isolated from cell culture medium, host cells, or both using 
techniques known in the art for purifying proteins including ion-exchange 
chromatography, gel filtration chromatography, ultrafiltration, electrophoresis, and 

1 5 immunoaffinity purification with antibodies specific for such polypeptides. 

Additionally, in many situations, polypeptides can be produced by chemical cleavage 
of a native protein (e.g., tryptic digestion) and the cleavage products can then be 
purified by standard techniques. 

In the case of membrane bound proteins, these can be isolated from a host cell 

20 by contacting a membrane-associated protein fraction with a detergent forming a 
solubilized complex, where the membrane-associated protein is no longer entirely 
embedded in the membrane fraction and is solubilized at least to an extent which 
allows it to be chromatographically isolated from the membrane fraction. 
Chromatographic techniques which can be used in the final purification step are 

25 known in the art and include hydrophobic interaction, lectin affinity, ion exchange, 
dye affinity and immunoaffinity. 

One strategy to maximize recombinant C. albicans peptide expression in E. 
coli is to express the protein in a host bacteria with an impaired capacity to 
proteolytically cleave the recombinant protein (Gottesman, S., Gene Expression 

30 Technology: Methods in Enzymology 185 , Academic Press, San Diego, California 
(1990) 1 19-128). Another strategy would be to alter the nucleic acid encoding a C. 
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albicans peptide to be inserted into an expression vector so that the individual codons 
for each amino acid would be those preferentially utilized in highly expressed E. coli 
proteins (Wada et al., (1992) Nuc. Acids Res, 20:21 1 1-2118). Such alteration of 
nucleic acids of the invention can be carried out by standard DNA synthesis 
5 techniques. 

The nucleic acids of the invention can also be chemically synthesized using 
standard techniques. Various methods of chemically synthesizing 
polydeoxynucleotides are known, including solid-phase synthesis which, like peptide 
synthesis, has been fully automated in commercially available DNA synthesizers 

10 (See, e.g., Itakura et al. U.S. Patent No. 4,598,049; Caruthers et al. U.S. Patent No. 
4,458,066; and Itakura U.S. Patent Nos. 4,401,796 and 4,373,071, incorporated by 
reference herein). The present invention provides a library of C. albicans-derived 
nucleic acid sequences. The libraries provide probes, primers, and markers which can 
be used as markers in epidemiological studies. The present invention also provides a 

15 library of C. albicans-fexxvzd nucleic acid sequences which comprise or encode 
targets for therapeutic drugs. 

Nucleic acids comprising any of the sequences disclosed herein or sub- 
sequences thereof can be prepared by standard methods using the nucleic acid 
sequence information provided in SEQ ID NO: 1 - SEQ ID NO: 14103. For example, 

20 DNA can be chemically synthesized using, e.g., the phosphoramidite solid support 
method of Matteucci et al, 1981, J. Am. Chem. Soc. 103:3185, the method of Yoo et 
al., 1989, J. Biol. Chem. 764: 17078, or other well known methods. This can be done 
by sequentially linking a series of oligonucleotide cassettes comprising pairs of 
synthetic oligonucleotides, as described below. 

25 Of course, due to the degeneracy of the genetic code, many different 

nucleotide sequences can encode polypeptides having the amino acid sequences 
defined by SEQ ID NO: 14104 - SEQ ID NO: 28206 or sub-sequences thereof. The 
codons can be selected for optimal expression in prokaryotic or eukaryotic systems. 
Such degenerate variants are also encompassed by this invention. 

30 Insertion of nucleic acids (typically DNAs) encoding the polypeptides of the 

invention into a vector is easily accomplished when the termini of both the DNAs and 
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the vector comprise compatible restriction sites. If this cannot be done, it may be 
necessary to modify the termini of the DNAs and/or vector by digesting back single- 
stranded DNA overhangs generated by restriction endonuclease cleavage to produce 
blunt ends, or to achieve the same result by filling in the single-stranded termini with 
5 an appropriate DNA polymerase. 

Alternatively, any site desired may be produced, e.g., by ligating nucleotide 
sequences (linkers) onto the termini. Such linkers may comprise specific 
oligonucleotide sequences that define desired restriction sites. Restriction sites can 
also be generated by the use of the polymerase chain reaction (PCR). See, e.g., Saiki 

10 et al. 9 1988, Science 239:48. The cleaved vector and the DNA fragments may also be 
modified if required by homopolymeric tailing. 

The nucleic acids of the invention may be isolated directly from cells. 
Alternatively, the polymerase chain reaction (PCR) method can be used to produce 
the nucleic acids of the invention, using either chemically synthesized strands or 

15 genomic material as templates. Primers used for PCR can be synthesized using the 
sequence information provided herein and can further be designed to introduce 
appropriate new restriction sites, if desirable, to facilitate incorporation into a given 
vector for recombinant expression. 

The nucleic acids of the present invention may be flanked by natural C. 

20 albicans regulatory sequences, or may be associated with heterologous sequences, 

including promoters, enhancers, response elements, signal sequences, polyadenylation 
sequences, introns, 5'- and 3'- noncoding regions, and the like. The nucleic acids may 
also be modified by many means known in the art. Non-limiting examples of such 
modifications include methylation, "caps", substitution of one or more of the naturally 

25 occurring nucleotides with an analog, internucleotide modifications such as, for 

example, those with uncharged linkages (e.g., methyl phosphonates, phosphotriesters, 
phosphoroamidates, carbamates, etc.) and with charged linkages (e.g., 
phosphorothioates, phosphorodithioates, etc.). Nucleic acids may contain one or more 
additional covalently linked moieties, such as, for example, proteins (e.g., nucleases, 

30 toxins, antibodies, signal peptides, poly-L-lysine, etc.), intercalators (e.g., acridine, 
psoralen, etc.), chelators (e.g., metals, radioactive metals, iron, oxidative metals, etc.), 
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and alkylators. PNAs are also included. The nucleic acid may be derivatized by 
formation of a methyl or ethyl phosphotriester or an alkyl phosphoramidate linkage. 
Furthermore, the nucleic acid sequences of the present invention may also be 
modified with a label capable of providing a detectable signal, either directly or 
5 indirectly. Exemplary labels include radioisotopes, fluorescent molecules, biotin, and 
the like. 

The invention also provides nucleic acid vectors comprising the disclosed C. 

alb /cam-derived sequences or derivatives or fragments thereof. A large number of 

vectors, including plasmid and fungal vectors, have been described for replication 
10 and/or expression in a variety of eukaryotic and prokaryotic hosts, and may be used 

for cloning or protein expression. 

The encoded C. albicans polypeptides may be expressed by using many 

known vectors, such as pUC plasmids, pET plasmids (Novagen, Inc., Madison, WI), 

or pRSET or pREP (Invitrogen, San Diego, CA), and many appropriate host cells, 
1 5 using methods disclosed or cited herein or otherwise known to those skilled in the 

relevant art. The particular choice of vector/host is not critical to the practice of the 

invention. 

Recombinant cloning vectors will often include one or more replication 
systems for cloning or expression, one or more markers for selection in the host, e.g. 

20 antibiotic resistance, and one or more expression cassettes. The inserted C. albicans 
coding sequences may be synthesized by standard methods, isolated from natural 
sources, or prepared as hybrids, etc. Ligation of the C. albicans coding sequences to 
transcriptional regulatory elements and/or to other amino acid coding sequences may 
be achieved by known methods. Suitable host cells may be 

25 transformed/transfected/infected as appropriate by any suitable method including 
electroporation, CaCI 2 mediated DNA uptake, fungal infection, microinjection, 
microprojectile, or other established methods. 

Appropriate host cells include bacteria, archebacteria, fiingi, especially yeast, 
and plant and animal cells, especially mammalian cells. Of particular interest are C 

30 albicans, E. coli, B. Subtilis, Saccharomyces cerevisiae, Saccharomyces 

carlsbergensis, Schizosaccharomyces pombi, SF9 cells, CI 29 cells, 293 cells, 
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Neurospora, and CHO cells, COS cells, HeLa cells, and immortalized mammalian 
myeloid and lymphoid cell lines. Preferred replication systems include M13, ColEl, 
SV40, baculovirus, lambda, adenovirus, and the like. A large number of transcription 
initiation and termination regulatory regions have been isolated and shown to be 
5 effective in the transcription and translation of heterologous proteins in the various 
hosts. Examples of these regions, methods of isolation, manner of manipulation, etc. 
are known in the art. Under appropriate expression conditions, host cells can be used 
as a source of recombinantly produced C. albicans-fenved peptides and polypeptides. 
Advantageously, vectors may also include a transcription regulatory element 

10 (i.e., a promoter) operably linked to the C. albicans portion. The promoter may 
optionally contain operator portions and/or ribosome binding sites. Non-limiting 
examples of fungal promoters compatible with E. coli include: b-lactamase 
(penicillinase) promoter; lactose promoter; tryptophan (trp) promoter; araBAD 
(arabinose) operon promoter; lambda-derived P] promoter and N gene ribosome 

15 binding site; and the hybrid tac promoter derived from sequences of the trp and lac 
UV5 promoters. Non-limiting examples of yeast promoters include 3- 
phosphoglycerate kinase promoter, glyceraldehyde-3 -phosphate dehydrogenase 
(GAPDH) promoter, galactokinase (GAL1) promoter, galactoepimerase promoter, 
and alcohol dehydrogenase (ADH) promoter. Suitable promoters for mammalian cells 

20 include without limitation viral promoters such as that from Simian Virus 40 (SV40), 
Rous sarcoma virus (RSV), adenovirus (ADV), and bovine papilloma virus (BPV). 
Mammalian cells may also require terminator sequences, polyA addition sequences 
and enhancer sequences to increase expression. Sequences which cause amplification 
of the gene may also be desirable. Furthermore, sequences that facilitate secretion of 

25 the recombinant product from cells, including, but not limited to, bacteria, yeast, and 
animal cells, such as secretory signal sequences and/or prohormone pro region 
sequences, may also be included. These sequences are well described in the art. 

Nucleic acids encoding wild-type or variant C. albicans-derived polypeptides 
may also be introduced into cells by recombination events. For example, such a 

30 sequence can be introduced into a cell, and thereby effect homologous recombination 
at the site of an endogenous gene or a sequence with substantial identity to the gene. 
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Other recombination-based methods such as nonhomologous recombinations or 
deletion of endogenous genes by homologous recombination may also be used. 

The nucleic acids of the present invention find use as templates for the 
recombinant production of C. a/fe/caras-derived peptides or polypeptides. 

5 

Identification andUse of C. albicans Nucleic Acid Sequences 

The disclosed C. albicans polypeptide and nucleic acid sequences, or other 
sequences that are contained within ORFs, including complete protein-coding 
sequences, of which any of the disclosed C. albicans-specific sequences forms a part, 
10 are useful as target components for diagnosis and/or treatment ofC albicans- caused 
infection 

It will be understood that the sequence of an entire protein-coding sequence of 
which each disclosed nucleic acid sequence forms a part can be isolated and identified 
based on each disclosed sequence. This can be achieved, for example, by using an 

15 isolated nucleic acid encoding the disclosed sequence, or fragments thereof, to prime 
a sequencing reaction with genomic C. albicans DNA as template; this is followed by 
sequencing the amplified product. The isolated nucleic acid encoding the disclosed 
sequence, or fragments thereof, can also be hybridized to C. albicans genomic 
libraries to identify clones containing additional complete segments of the protein- 

20 coding sequence of which the shorter sequence forms a part. Then, the entire protein- 
coding sequence, or fragments thereof, or nucleic acids encoding all or part of the 
sequence, or sequence-conservative or function-conservative variants thereof, may be 
employed in practicing the present invention. 

Preferred sequences are those that are useful in diagnostic and/or therapeutic 

25 applications. Diagnostic applications include without limitation nucleic-acid-based 
and antibody-based methods for detecting fungal infection. Therapeutic applications 
include without limitation vaccines, passive immunotherapy, and drug treatments 
directed against gene products that are both unique to fungi and essential for growth 
and/or replication of fungi. 
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Identification of Nucleic Acids Encoding Vaccine Components and Targets for 
Agents Effective Against C. albicans 

The disclosed C. albicans genome sequence includes segments that direct the 
synthesis of ribonucleic acids and polypeptides, as well as origins of replication, 
5 promoters, other types of regulatory sequences, and intergenic nucleic acids. The 
invention encompasses nucleic acids encoding immunogenic components of vaccines 
and targets for agents effective against C. albicans. Identification of said 
immunogenic components involved in the determination of the function of the 
disclosed sequences, which can be achieved using a variety of approaches. Non- 

10 limiting examples of these approaches are described briefly below. 
Homology to known sequences: 

Computer-assisted comparison of the disclosed C. albicans sequences with 
previously reported sequences present in publicly available databases is useful for 
identifying functional C. albicans nucleic acid and polypeptide sequences. It will be 

15 understood that protein-coding sequences, for example, may be compared as a whole, 
and that a high degree of sequence homology between two proteins (such as, for 
example, >80-90%) at the amino acid level indicates that the two proteins also possess 
some degree of functional homology, such as, for example, among enzymes involved 
in metabolism, DNA synthesis, or cell wall synthesis, and proteins involved in 

20 transport, cell division, etc. In addition, many structural features of particular protein 
classes have been identified and correlate with specific consensus sequences, such as, 
for example, binding domains for nucleotides, DNA, metal ions, and other small 
molecules; sites for covalent modifications such as phosphorylation, acylation, and 
the like; sites of protein:protein interactions, etc. These consensus sequences may be 

25 quite short and thus may represent only a fraction of the entire protein-coding 

sequence. Identification of such a feature in a C. albicans sequence is therefore useful 
in determining the function of the encoded protein and identifying useful targets of 
antifungal drugs. 

Of particular relevance to the present invention are structural features that are 
30 common to secretory, transmembrane, and surface proteins, including secretion signal 
peptides and hydrophobic transmembrane domains. C. albicans proteins identified as 
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containing putative signal sequences and/or transmembrane domains are useful as 
immunogenic components of vaccines. 

Targets for therapeutic drugs according to the invention include, but are not 
limited to, polypeptides of the invention, whether unique to C. albicans or not, that 
5 are essential for growth and/or viability of C. albicans under at least one growth 
condition. Polypeptides essential for growth and/or viability can be determined by 
examining the effect of deleting and/or disrupting the genes, i.e., by so-called gene 
"knockout". Alternatively, genetic footprinting can be used (Smith et al 9 1995, Proc. 
Natl Acad. Sci. USA 92:5479-6433; Published International Application WO 
1 0 94/26933; U.S. Patent No. 5,612, 1 80). Still other methods for assessing essentiality 
includes the ability to isolate conditional lethal mutations in the specific gene (e.g., 
temperature sensitive mutations). Other useful targets for therapeutic drugs, which 
include polypeptides that are not essential for growth or viability per se but lead to 
loss of viability of the cell, can be used to target therapeutic agents to cells. 

15 

Strain-specific sequences: 

Because of the evolutionary relationship between different C. albicans strains, 
it is believed that the presently disclosed C. albicans sequences are useful for 
identifying, and/or discriminating between, previously known and new C. albicans 

20 strains. It is believed that other C. albicans strains will exhibit at least about 70% 
sequence homology with the presently disclosed sequence. Systematic and routine 
analyses of DNA sequences derived from samples containing C. albicans strains, and 
comparison with the present sequence allows for the identification of sequences that 
can be used to discriminate between strains, as well as those that are common to all C. 

25 albicans strains. In one embodiment, the invention provides nucleic acids, including 
probes, and peptide and polypeptide sequences that discriminate between different 
strains of C. albicans. Strain-specific components can also be identified functionally 
by their ability to elicit or react with antibodies that selectively recognize one or more 
C. albicans strains. 
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In another embodiment, the invention provides nucleic acids, including 
probes, and peptide and polypeptide sequences that are common to all C. albicans 
strains but are not found in other fungal species. 

5 C. albicans Polypeptides 

This invention encompasses isolated C. albicans polypeptides encoded by the 
disclosed C. albicans genomic sequences, including the polypeptides of the invention 
contained in the Sequence Listing. Polypeptides of the invention are preferably at 
least about 5 amino acid residues in length. Using the DNA sequence information 

1 0 provided herein, the amino acid sequences of the polypeptides encompassed by the 
invention can be deduced using methods well-known in the art. It will be understood 
that the sequence of an entire nucleic acid encoding a C. albicans polypeptide can be 
isolated and identified based on an ORF that encodes only a fragment of the cognate 
protein-coding region. This can be achieved, for example, by using the isolated 

1 5 nucleic acid encoding the ORF, or fragments thereof, to prime a polymerase chain 
reaction with genomic C. albicans DNA as template; this is followed by sequencing 
the amplified product. 

The polypeptides of the present invention, including function-conservative 
variants of the disclosed ORFs, may be isolated from wild-type or mutant C. albicans 

20 cells, or from heterologous organisms or cells (including, but not limited to, bacteria, 
fungi, insect, plant, and mammalian cells) including C. albicans into which a C. 
albicans-dtvb/zd protein-coding sequence has been introduced and expressed. 
Furthermore, the polypeptides may be part of recombinant fusion proteins. 

C. albicans polypeptides of the invention can be chemically synthesized using 

25 commercially automated procedures such as those referenced herein , including, 
without limitation, exclusive solid phase synthesis, partial solid phase methods, 
fragment condensation or classical solution synthesis. The polypeptides are 
preferably prepared by solid phase peptide synthesis as described by Merrifield, 1963, 
J. Am. Chem. Soc. 85:2149. The synthesis is carried out with amino acids that are 

30 protected at the alpha-amino terminus. Trifunctional amino acids with labile side- 
chains are also protected with suitable groups to prevent undesired chemical reactions 
from occurring during the assembly of the polypeptides. The alpha-amino protecting 
group is selectively removed to allow subsequent reaction to take place at the amino- 
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terminus. The conditions for the removal of the alpha-amino protecting group do not 
remove the side-chain protecting groups. 

Methods for polypeptide purification are well-known in the art, including, 
without limitation, preparative disc-gel electrophoresis, isoelectric focusing, HPLC, 
5 reversed -phase HPLC, gel filtration, ion exchange and partition chromatography, and 
countercurrent distribution. For some purposes, it is preferable to produce the 
polypeptide in a recombinant system in which the C albicans protein contains an 
additional sequence tag that facilitates purification, such as, but not limited to, a 
polyhistidine sequence. The polypeptide can then be purified from a crude lysate of 
10 the host cell by chromatography on an appropriate solid-phase matrix. Alternatively, 
antibodies produced against a C. albicans protein or against peptides derived 
therefrom can be used as purification reagents. Other purification methods are 
possible. 

The present invention also encompasses derivatives and homologues of C. 

15 albicans-encoded polypeptides. For some purposes, nucleic acid sequences encoding 
the peptides may be altered by substitutions, additions, or deletions that provide for 
functionally equivalent molecules, i.e., function-conservative variants. For example, 
one or more amino acid residues within the sequence can be substituted by another 
amino acid of similar properties, such as, for example, positively charged amino acids 

20 (arginine, lysine, and histidine); negatively charged amino acids (aspartate and 
glutamate); polar neutral amino acids; and non-polar amino acids. 

The isolated polypeptides may be modified by, for example, phosphorylation, 
sulfation, acylation, or other protein modifications. They may also be modified with a 
label capable of providing a detectable signal, either directly or indirectly, including, 

25 but not limited to, radioisotopes and fluorescent compounds. 

To identify C. albicans-derived polypeptides for use in the present invention, 
essentially the complete genomic sequence of a C albicans isolate was analyzed. 
While, in very rare instances, a nucleic acid sequencing error may be revealed, 
resolving a rare sequencing error is well within the art, and such an occurrence will 

30 not prevent one skilled in the art from practicing the invention. 
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Also encompassed are any C. albicans polypeptide sequences that are 
contained within the open reading frames (ORFs), including complete protein-coding 
sequences, of which any of SEQ ID NO: 1 - SEQ ID NO: 14103 forms a part. Table 
2, which is appended herewith and which forms part of the present specification, 
5 provides a putative identification of the particular function of a polypeptide which is 
encoded by each ORF based on the homology match (determined by the BLAST 
algorithm) of the predicted polypeptide with known proteins encoded by ORFs in 
other organisms. As a result, one skilled in the art can use the polypeptides of the 
present invention for commercial and industrial purposes consistent with the type of 

1 0 putative identification of the polypeptide. 

The present invention provides a library of C. albicans-derived polypeptide 
sequences, and a corresponding library of nucleic acid sequences encoding the 
polypeptides, wherein the polypeptides themselves, or polypeptides contained within 
ORFs of which they form a part, comprise sequences that are contemplated for use as 

15 components of vaccines. Non-limiting examples of such sequences are listed by SEQ 
ID NO in Table 2, which is appended herewith and which forms part of the present 
specification. 

The present invention also provides a library of C. albicans-derived 
polypeptide sequences, and a corresponding library of nucleic acid sequences 

20 encoding the polypeptides, wherein the polypeptides themselves, or polypeptides 
contained within ORFs of which they form a part, comprise sequences lacking 
homology to any known prokaryotic or eukaryotic sequences. Such libraries provide 
probes, primers, and markers which can be used to diagnose C. albicans infection, 
including use as markers in epidemiological studies. Non-limiting examples of such 

25 sequences are listed by SEQ ID NO in Table 2, which is appended 

The present invention also provides a library of C. albicans-der'wed 
polypeptide sequences, and a corresponding library of nucleic acid sequences 
encoding the polypeptides, wherein the polypeptides themselves, or polypeptides 
contained within ORFs of which they form a part, comprise targets for therapeutic 

30 drugs. 
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Specific Example: Determination Of Candidate Protein Antigens For Antibody And 
Vaccine Development 

The selection of candidate protein antigens for vaccine development can be 
5 derived from the nucleic acids encoding C. albicans polypeptides. First, the ORFs 
can be analyzed for homology to other known exported or membrane proteins and 
analyzed using the discriminant analysis described by Klein, et al. (Klein, P., 
Kanehsia, M., and DeLisi, C. (1985) Biochimica et Biophysica Acta 815, 468-476) for 
predicting exported and membrane proteins. 

10 Homology searches can be performed using the BLAST algorithm contained 

in the Wisconsin Sequence Analysis Package (Genetics Computer Group, University 
Research Park, 575 Science Drive, Madison, WI 5371 1) to compare each predicted 
ORF amino acid sequence with all sequences found in the current GenBank, SWISS- 
PROT and PIR databases. BLAST searches for local alignments between the ORF 

15 and the databank sequences and reports a probability score which indicates the 

probability of finding this sequence by chance in the database. ORFs with significant 
homology (e.g. probabilities lower than lxl 0" 6 that the homology is only due to 
random chance) to membrane or exported proteins represent protein antigens for 
vaccine development. Possible functions can be provided to C. albicans genes based 

20 on sequence homology to genes cloned in other organisms. 

Discriminant analysis (Klein, et al. supra) can be used to examine the ORF 
amino acid sequences. This algorithm uses the intrinsic information contained in the 
ORF amino acid sequence and compares it to information derived from the properties 
of known membrane and exported proteins. This comparison predicts which proteins 

25 will be exported, membrane associated or cytoplasmic. ORF amino acid sequences 
identified as exported or membrane associated by this algorithm are likely protein 
antigens for vaccine development. 



30 



Production of Fragments and Analogs of C albicans Nucleic Acids and Polypeptides 

Based on the discovery of the C. albicans gene products of the invention 
provided in the Sequence Listing, one skilled in the art can alter the disclosed 
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structure of C. albicans genes, e.g., by producing fragments or analogs, and test the 
newly produced structures for activity. Examples of techniques known to those 
skilled in the relevant art which allow the production and testing of fragments and 
analogs are discussed below. These, or analogous methods can be used to make and 
5 screen libraries of polypeptides, e.g., libraries of random peptides or libraries of 
fragments or analogs of cellular proteins for the ability to bind C. albicans 
polypeptides. Such screens are useful for the identification of inhibitors of C. 
albicans. 

10 Generation of Fragments 

Fragments of a protein can be produced in several ways, e.g., recombinantly, 
by proteolytic digestion, or by chemical synthesis. Internal or terminal fragments of a 
polypeptide can be generated by removing one or more nucleotides from one end (for 
a terminal fragment) or both ends (for an internal fragment) of a nucleic acid which 

15 encodes the polypeptide. Expression of the mutagenized DNA produces polypeptide 
fragments. Digestion with "end-nibbling" endonucleases can thus generate DNAs 
which encode an array of fragments. DNAs which encode fragments of a protein can 
also be generated by random shearing, restriction digestion or a combination of the 
above-discussed methods. 

20 Fragments can also be chemically synthesized using techniques known in the 

art such as conventional Merrifield solid phase f-Moc or t-Boc chemistry. For 
example, peptides of the present invention may be arbitrarily divided into fragments 
of desired length with no overlap of the fragments, or divided into overlapping 
fragments of a desired length. 

25 

Alteration of Nucleic Acids and Polypeptides: Random Methods 

Amino acid sequence variants of a protein can be prepared by random 
mutagenesis of DNA which encodes a protein or a particular domain or region of a 
protein. Useful methods include PCR mutagenesis and saturation mutagenesis. A 
30 library of random amino acid sequence variants can also be generated by the synthesis 
of a set of degenerate oligonucleotide sequences. (Methods for screening proteins in a 
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library of variants are elsewhere herein). 
PCR Mutagenesis 

In PCR mutagenesis, reduced Taq polymerase fidelity is used to introduce 
random mutations into a cloned fragment of DNA (Leung et al., 1989, Technique 
5 1:11-15). The DNA region to be mutagenized is amplified using the polymerase 
chain reaction (PCR) under conditions that reduce the fidelity of DNA synthesis by 
Taq DNA polymerase, e.g., by using a dGTP/dATP ratio of five and adding Mn 2+ to 
the PCR reaction. The pool of amplified DNA fragments are inserted into appropriate 
cloning vectors to provide random mutant libraries. 
10 Saturation Mutagenesis 

Saturation mutagenesis allows for the rapid introduction of a large number of 
single base substitutions into cloned DNA fragments (Mayers et al., 1985, Science 
229:242). This technique includes generation of mutations, e.g., by chemical 
treatment or irradiation of single-stranded DNA in vitro, and synthesis of a 
15 complimentary DNA strand. The mutation frequency can be modulated by 

modulating the severity of the treatment, and essentially all possible base substitutions 
can be obtained. Because this procedure does not involve a genetic selection for 
mutant fragments both neutral substitutions, as well as those that alter function, are 
obtained. The distribution of point mutations is not biased toward conserved 
20 sequence elements. 

Degenerate Oligonucleotides 

A library of homologs can also be generated from a set of degenerate 
oligonucleotide sequences. Chemical synthesis of a degenerate sequences can be 
carried out in an automatic DNA synthesizer, and the synthetic genes then ligated into 

25 an appropriate expression vector. The synthesis of degenerate oligonucleotides is 

known in the art (see for example, Narang, SA (1983) Tetrahedron 39:3; Itakura et al. 
(1981) Recombinant DNA, Proc 3rd Cleveland Sympos. Macromolecules, ed. AG 
Walton, Amsterdam: Elsevier pp273-289; Itakura et al. (1984) Annu. Rev. Biochem. 
53:323; Itakura et al. (1984) Science 198:1056; Ike et al. (1983) Nucleic Acid Res. 

30 1 1 'All. Such techniques have been employed in the directed evolution of other 
proteins (see, for example, Scott et al. (1990) Science 249:386-390; Roberts et al. 
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(1992) PNAS 89:2429-2433; Devlin et al. (1990) Science 249: 404-406; Cwirla et al. 
(1990) PNAS 87: 6378-6382; as well as U.S. Patents Nos. 5,223,409, 5,198,346, and 
5,096,815). 

5 Alteration of Nucleic Acids and Polypeptides: Methods for Directed Mutagenesis 
Non-random or directed, mutagenesis techniques can be used to provide 
specific sequences or mutations in specific regions. These techniques can be used to 
create variants which include, e.g., deletions, insertions, or substitutions, of residues 
of the known amino acid sequence of a protein. The sites for mutation can be 

10 modified individually or in series, e.g., by (1) substituting first with conserved amino 
acids and then with more radical choices depending upon results achieved, (2) 
deleting the target residue, or (3) inserting residues of the same or a different class 
adjacent to the located site, or combinations of options 1-3. 
Alanine Scanning Mutagenesis 

15 Alanine scanning mutagenesis is a useful method for identification of certain 

residues or regions of the desired protein that are preferred locations or domains for 
mutagenesis, Cunningham and Wells {Science 244:1081-1085, 1989). In alanine 
scanning, a residue or group of target residues are identified (e.g., charged residues 
such as Arg, Asp, His, Lys, and Glu) and replaced by a neutral or negatively charged 

20 amino acid (most preferably alanine or polyalanine). Replacement of an amino acid 
can affect the interaction of the amino acids with the surrounding aqueous 
environment in or outside the cell. Those domains demonstrating functional 
sensitivity to the substitutions are then refined by introducing further or other variants 
at or for the sites of substitution. Thus, while the site for introducing an amino acid 

25 sequence variation is predetermined, the nature of the mutation per se need not be 
predetermined. For example, to optimize the performance of a mutation at a given 
site, alanine scanning or random mutagenesis may be conducted at the target codon or 
region and the expressed desired protein subunit variants are screened for the optimal 
combination of desired activity. 

30 Oligonucleotide-Mediated Mutagenesis 

Oligonucleotide-mediated mutagenesis is a useful method for preparing 
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substitution, deletion, and insertion variants of DNA, see, e.g., Adelman et al., (DNA 
2:183, 1983). Briefly, the desired DNA is altered by hybridizing an oligonucleotide 
encoding a mutation to a DNA template, where the template is the single-stranded 
form of a plasmid or bacteriophage containing the unaltered or native DNA sequence 
5 of the desired protein. After hybridization, a DNA polymerase is used to synthesize 
an entire second complementary strand of the template that will thus incorporate the 
oligonucleotide primer, and will code for the selected alteration in the desired protein 
DNA. Generally, oligonucleotides of at least about 25 nucleotides in length are used. 
An optimal oligonucleotide will have 12 to 15 nucleotides that are completely 

10 complementary to the template on either side of the nucleotide(s) coding for the 

mutation. This ensures that the oligonucleotide will hybridize properly to the single- 
stranded DNA template molecule. The oligonucleotides are readily synthesized using 
techniques known in the art such as that described by Crea et al. (Proc. Natl Acad 
ScL USA, 75: 5765 [1978]). 

15 Cassette Mutagenesis 

Another method for preparing variants, cassette mutagenesis, is based on the 
technique described by Wells et al. (Gene, 34:3 15[1985]). The starting material is a 
plasmid (or other vector) which includes the protein subunit DNA to be mutated. The 
codon(s) in the protein subunit DNA to be mutated are identified. There must be a 

20 unique restriction endonuclease site on each side of the identified mutation site(s). If 
no such restriction sites exist, they may be generated using the above-described 
oligonucleotide-mediated mutagenesis method to introduce them at appropriate 
locations in the desired protein subunit DNA. After the restriction sites have been 
introduced into the plasmid, the plasmid is cut at these sites to linearize it. A double- 

25 stranded oligonucleotide encoding the sequence of the DNA between the restriction 
sites but containing the desired mutation(s) is synthesized using standard procedures. 
The two strands are synthesized separately and then hybridized together using 
standard techniques. This double-stranded oligonucleotide is referred to as the 
cassette. This cassette is designed to have 3' and 5' ends that are comparable with the 

30 ends of the linearized plasmid, such that it can be directly ligated to the plasmid. This 
plasmid now contains the mutated desired protein subunit DNA sequence. 
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Combinatorial Mutagenesis 

Combinatorial mutagenesis can also be used to generate mutants (Ladner et 
al., WO 88/06630). In this method, the amino acid sequences for a group of 
homologs or other related proteins are aligned, preferably to promote the highest 
5 homology possible. All of the amino acids which appear at a given position of the 
aligned sequences can be selected to create a degenerate set of combinatorial 
sequences. The variegated library of variants is generated by combinatorial 
mutagenesis at the nucleic acid level, and is encoded by a variegated gene library. 
For example, a mixture of synthetic oligonucleotides can be enzymatically ligated into 
10 gene sequences such that the degenerate set of potential sequences are expressible as 
individual peptides, or alternatively, as a set of larger fusion proteins containing the 
set of degenerate sequences. 

Other Modifications of C. albicans Nucleic Acids and Polypeptides 

15 It is possible to modify the structure of a C. albicans polypeptide for such 

purposes as increasing solubility, enhancing stability (e.g., shelf life ex vivo and 
resistance to proteolytic degradation in vivo). A modified C. albicans protein or 
peptide can be produced in which the amino acid sequence has been altered, such as 
by amino acid substitution, deletion, or addition as described herein. 

20 An C. albicans peptide can also be modified by substitution of cysteine 

residues preferably with alanine, serine, threonine, leucine or glutamic acid residues 
to minimize dimerization via disulfide linkages. In addition, amino acid side chains 
of fragments of the protein of the invention can be chemically modified. Another 
modification is cyclization of the peptide. 

25 In order to enhance stability and/or reactivity, a C. albicans polypeptide can be 

modified to incorporate one or more polymorphisms in the amino acid sequence of 
the protein resulting from any natural allelic variation. Additionally, D-amino acids, 
non-natural amino acids, or non-amino acid analogs can be substituted or added to 
produce a modified protein within the scope of this invention. Furthermore, a C. 

30 albicans polypeptide can be modified using polyethylene glycol (PEG) according to 
the method of A. Sehon and co-workers (Wie et al., supra) to produce a protein 
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conjugated with PEG. In addition, PEG can be added during chemical synthesis of 
the protein. Other modifications of C. albicans proteins include reduction/alkylation 
(Tarr, Methods of Protein Microcharacterization, J. E. Silver ed., Humana Press, 
Clifton NJ 155-194 (1986)); acylation (Tarr, supra); chemical coupling to an 
5 appropriate carrier (Mishell and Shiigi, eds, Selected Methods in Cellular 

Immunology, WH Freeman, San Francisco, CA (1980), U.S. Patent 4,939,239; or mild 
formalin treatment (Marsh, (1971) Int. Arch of Allergy andAppl Immunol, 41: 199 - 
215). 

To facilitate purification and potentially increase solubility of a C. albicans 
10 protein or peptide, it is possible to add an amino acid fusion moiety to the peptide 

backbone. For example, hexa-histidine can be added to the protein for purification by 
immobilized metal ion affinity chromatography (Hochuli, E. et al., (1988) 
Bio/Technology, 6: 1321 - 1325). In addition, to facilitate isolation of peptides free of 
irrelevant sequences, specific endoprotease cleavage sites can be introduced between 
15 the sequences of the fusion moiety and the peptide. 

To potentially aid proper antigen processing of epitopes within a C. albicans 
polypeptide, canonical protease sensitive sites can be engineered between regions, 
each comprising at least one epitope via recombinant or synthetic methods. For 
example, charged amino acid pairs, such as KK or RR, can be introduced between 
20 regions within a protein or fragment during recombinant construction thereof. The 
resulting peptide can be rendered sensitive to cleavage by cathepsin and/or other 
trypsin-like enzymes which would generate portions of the protein containing one or 
more epitopes. In addition, such charged amino acid residues can result in an increase 
in the solubility of the peptide. 

25 

Primary Methods for Screening Polypeptides and Analogs 

Various techniques are known in the art for screening generated mutant gene 
products. Techniques for screening large gene libraries often include cloning the gene 
library into replicable expression vectors, transforming appropriate cells with the 
30 resulting library of vectors, and expressing the genes under conditions in which 

detection of a desired activity, e.g., in this case, binding to C. albicans polypeptide or 
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an interacting protein, facilitates relatively easy isolation of the vector encoding the 
gene whose product was detected. Each of the techniques described below is 
amenable to high through-put analysis for screening large numbers of sequences 
created, e.g., by random mutagenesis techniques. 
5 Two Hybrid Systems 

Two hybrid assays such as the system described below (as with the other 
screening methods described herein), can be used to identify polypeptides, e.g., 
fragments or analogs of a naturally-occurring C. albicans polypeptide, e.g., of cellular 
proteins, or of randomly generated polypeptides which bind to a C. albicans protein. 
10 (The C albicans domain is used as the bait protein and the library of variants are 
expressed as prey fusion proteins.) In an analogous fashion, a two hybrid assay (as 
with the other screening methods described herein), can be used to find polypeptides 
which bind a C. albicans polypeptide. 

Display Libraries 

15 

1 5 In one approach to screening assays, the candidate peptides are displayed on 

the surface of a cell or viral particle, and the ability of particular cells or viral particles 
to bind an appropriate receptor protein via the displayed product is detected in a 
"panning assay". For example, the gene library can be cloned into the gene for a 
surface membrane protein of a fungal cell, and the resulting fusion protein detected by 

20 panning (Ladner et al., WO 88/06630; Fuchs et al. (1991) Bio/Technology 9:1370- 
1371; and Goward et al. (1992) TIBS 18:136-140). In a similar fashion, a detectably 
labeled ligand can be used to score for potentially functional peptide homologs. 
Fluorescently labeled ligands, e.g., receptors, can be used to detect homologs which 
retain ligand-binding activity. The use of fluorescently labeled ligands, allows cells to 

25 be visually inspected and separated under a fluorescence microscope, or, where the 
morphology of the cell permits, to be separated by a fluorescence-activated cell sorter. 

A gene library can be expressed as a fusion protein on the surface of a viral 
particle. For instance, in the filamentous phage system, foreign peptide sequences can 
be expressed on the surface of infectious phage, thereby conferring two significant 

30 benefits. First, since these phage can be applied to affinity matrices at concentrations 
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well over 10 13 phage per milliliter, a large number of phage can be screened at one 
time. Second, since each infectious phage displays a gene product on its surface, if a 
particular phage is recovered from an affinity matrix in low yield, the phage can be 
amplified by another round of infection. The group of almost identical E. coli 
5 filamentous phages, Ml 3, fd., and fl, are most often used in phage display libraries. 
Either of the phage gill or gVIII coat proteins can be used to generate fusion proteins 
without disrupting the ultimate packaging of the viral particle. Foreign epitopes can 
be expressed at the NH 2 -terminal end of pill and phage bearing such epitopes 
recovered from a large excess of phage lacking this epitope (Ladner et al. PCT 
10 publication WO 90/02909; Garrard et al., PCT publication WO 92/09690; Marks et al. 
(1992) J. Biol Chem. 267:16007-16010; Griffiths etal. (1993) EMBO J 12:725-734; 
Clackson et al. (1991) Nature 352:624-628; and Barbas et al. (1992) PNAS 89:4457- 
4461). 

A common approach uses the maltose receptor of E. coli (the outer membrane 

15 protein, LamB) as a peptide fusion partner (Charbit et al. (1986) EMBO 5, 3029- 

3037). Oligonucleotides have been inserted into plasmids encoding the LamB gene to 
produce peptides fused into one of the extracellular loops of the protein. These 
peptides are available for binding to ligands, e.g., to antibodies, and can elicit an 
immune response when the cells are administered to animals. Other cell surface 

20 proteins, e.g., OmpA (Schorr et al. (1991) Vaccines 91, pp. 387-392), PhoE 

(Agterberg, et al. (1990) Gene 88, 37-45), and PAL (Fuchs et al. (1991) Bio/Tech 9, 
1369-1372), as well as large bacterial surface structures have served as vehicles for 
peptide display. Peptides can be fused to pilin, a protein which polymerizes to form 
the pilus-a conduit for interbacterial exchange of genetic information (Thiry et al. 

25 (l9S9)Appl Environ. Microbiol 55, 984-993). Because of its role in interacting with 
other cells, the pilus provides a useful support for the presentation of peptides to the 
extracellular environment. Another large surface structure used for peptide display is 
the bacterial motive organ, the flagellum. Fusion of peptides to the subunit protein 
flagellin offers a dense array of many peptide copies on the host cells (Kuwajima et 

30 al. (1988) Bio/Tech. 6, 1080-1083). Surface proteins of other bacterial species have 
also served as peptide fusion partners. Examples include the Staphylococcus protein 
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A and the outer membrane IgA protease of Neisseria (Hansson et al. (1992) J. 
Bacteriol 1 74, 4239-4245 and Klauser et al. (1990) EMBO J. 9, 1991-1999). 

In the filamentous phage systems and the LamB system described above, the 
physical link between the peptide and its encoding DNA occurs by the containment of 
5 the DNA within a particle (cell or phage) that carries the peptide on its surface. 
Capturing the peptide captures the particle and the DNA within. An alternative 
scheme uses the DNA-binding protein Lad to form a link between peptide and DNA 
(Cull et al. (1992) PNAS USA 89:1865-1869). This system uses a plasmid containing 
the Lad gene with an oligonucleotide cloning site at its 3'-end. Under the controlled 

10 induction by arabinose, a Lacl-peptide fusion protein is produced. This fusion retains 
the natural ability of LacI to bind to a short DNA sequence known as LacO operator 
(LacO). By installing two copies of LacO on the expression plasmid, the Lacl- 
peptide fusion binds tightly to the plasmid that encoded it. Because the plasmids in 
each cell contain only a single oligonucleotide sequence and each cell expresses only 

1 5 a single peptide sequence, the peptides become specifically and stablely associated 
with the DNA sequence that directed its synthesis. The cells of the library are gently 
lysed and the peptide-DNA complexes are exposed to a matrix of immobilized 
receptor to recover the complexes containing active peptides. The associated plasmid 
DNA is then reintroduced into cells for amplification and DNA sequencing to 

20 determine the identity of the peptide ligands. As a demonstration of the practical 

utility of the method, a large random library of dodecapeptides was made and selected 
on a monoclonal antibody raised against the opioid peptide dynorphin B. A cohort of 
peptides was recovered, all related by a consensus sequence corresponding to a six- 
residue portion of dynorphin B. (Cull et al. (1992) Proc. Natl Acad. Sci. U.S.A. 89- 

25 1869) 

This scheme, sometimes referred to as peptides-on-plasmids, differs in two 
important ways from the phage display methods. First, the peptides are attached to 
the C-terminus of the fusion protein, resulting in the display of the library members as 
peptides having free carboxy termini. Both of the filamentous phage coat proteins, 
30 pill and pVIII, are anchored to the phage through their C-termini, and the guest 

peptides are placed into the outward-extending N-terminal domains. In some designs, 
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the phage-displayed peptides are presented right at the amino terminus of the fusion 
protein. (Cwirla, et al. (1990) Proc. Natl Acad. ScL U.S.A. 87, 6378-6382) A second 
difference is the set of biological biases affecting the population of peptides actually 
present in the libraries. The Lad fusion molecules are confined to the cytoplasm of 
5 the host cells. The phage coat fusions are exposed briefly to the cytoplasm during 
translation but are rapidly secreted through the inner membrane into the periplasmic 
compartment, remaining anchored in the membrane by their C-terminal hydrophobic 
domains, with the N-termini, containing the peptides, protruding into the periplasm 
while awaiting assembly into phage particles. The peptides in the Lad and phage 

10 libraries may differ significantly as a result of their exposure to different proteolytic 
activities. The phage coat proteins require transport across the inner membrane and 
signal peptidase processing as a prelude to incorporation into phage. Certain peptides 
exert a deleterious effect on these processes and are underrepresented in the libraries 
(Gallop et al. (1994) J. Med. Chem. 37(9):1233-1251). These particular biases are not 

1 5 a factor in the LacI display system. 

The number of small peptides available in recombinant random libraries is 
enormous. Libraries of 10 7 -10 9 independent clones are routinely prepared. Libraries 
as large as 10 11 recombinants have been created, but this size approaches the practical 
limit for clone libraries. This limitation in library size occurs at the step of 

20 transforming the DNA containing randomized segments into the host bacterial cells. 
To circumvent this limitation, an in vitro system based on the display of nascent 
peptides in polysome complexes has recently been developed. This display library 
method has the potential of producing libraries 3-6 orders of magnitude larger than 
the currently available phage/phagemid or plasmid libraries. Furthermore, the 

25 construction of the libraries, expression of the peptides, and screening, is done in an 
entirely cell-free format. 

In one application of this method (Gallop et al. (1994) J. Med. Chem. 
37(9):1233-1251), a molecular DNA library encoding 10 12 decapeptides was 
constructed and the library expressed in an E. coli S30 in vitro coupled 

30 transcription/translation system. Conditions were chosen to stall the ribosomes on the 
mRNA, causing the accumulation of a substantial proportion of the RNA in 
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polysomes and yielding complexes containing nascent peptides still linked to their 
encoding RNA. The polysomes are sufficiently robust to be affinity purified on 
immobilized receptors in much the same way as the more conventional recombinant 
peptide display libraries are screened. RNA from the bound complexes is recovered, 
5 converted to cDNA, and amplified by PCR to produce a template for the next round 
of synthesis and screening. The polysome display method can be coupled to the 
phage display system. Following several rounds of screening, cDNA from the 
enriched pool of polysomes was cloned into a phagemid vector. This vector serves as 
both a peptide expression vector, displaying peptides fused to the coat proteins, and as 

10 a DNA sequencing vector for peptide identification. By expressing the polysome- 
derived peptides on phage, one can either continue the affinity selection procedure in 
this format or assay the peptides on individual clones for binding activity in a phage 
ELISA, or for binding specificity in a completion phage ELISA (Barret, et al. (1992) 
Anal. Biochem 204,357-364). To identify the sequences of the active peptides one 

15 sequences the DNA produced by the phagemid host. 

Secondary Screening of Polypeptides and Analogs 

The high through-put assays described above can be followed by secondary 
screens in order to identify further biological activities which will, e.g., allow one 

20 skilled in the art to differentiate agonists from antagonists. The type of a secondary 
screen used will depend on the desired activity that needs to be tested. For example, 
an assay can be developed in which the ability to inhibit an interaction between a 
protein of interest and its respective ligand can be used to identify antagonists from a 
group of peptide fragments isolated though one of the primary screens described 

25 above. 

Therefore, methods for generating fragments and analogs and testing them for 
activity are known in the art. Once the core sequence of interest is identified, it is 
routine for one skilled in the art to obtain analogs and fragments. 

30 Peptide Mimetics of C albicans Polypeptides 

The invention also provides for reduction of the protein binding domains of 
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the subject C. albicans polypeptides to generate mimetics, e.g. peptide or non-peptide 
agents. The peptide mimetics are able to disrupt binding of a polypeptide to its 
counter ligand, e.g., in the case of a C. albicans polypeptide binding to a naturally 
occurring ligand. The critical residues of a subject C albicans polypeptide which are 
5 involved in molecular recognition of a polypeptide can be determined and used to 
generate C. albicans- A^fw^d peptidomimetics which competitively or 
noncompetitively inhibit binding of the C. albicans polypeptide with an interacting 
polypeptide (see, for example, European patent applications EP-4 12,762 A and EP- 
B31,080A). 

10 For example, scanning mutagenesis can be used to map the amino acid 

residues of a particular C. albicans polypeptide involved in binding an interacting 
polypeptide, peptidomimetic compounds (e.g. diazepine or isoquinoline derivatives) 
can be generated which mimic those residues in binding to an interacting polypeptide, 
and which therefore can inhibit binding of a C. albicans polypeptide to an interacting 

1 5 polypeptide and thereby interfere with the function of C albicans polypeptide. For 
instance, non-hydrolyzable peptide analogs of such residues can be generated using 
benzodiazepine (e.g., see Freidinger et al. in Peptides: Chemistry and Biology, G.R. 
Marshall ed., ESCOM Publisher: Leiden, Netherlands, 1988), azepine (e.g., see 
Huffman et al. in Peptides: Chemistry and Biology, G.R. Marshall ed., ESCOM 

20 Publisher: Leiden, Netherlands, 1988), substituted gama lactam rings (Garvey et al. in 
Peptides: Chemistry and Biology, G.R. Marshall ed., ESCOM Publisher: Leiden, 
Netherlands, 1988), keto-methylene pseudopeptides (Ewenson et al. (1986) J Med 
Chem 29:295; and Ewenson et al. in Peptides: Structure and Function (Proceedings 
of the 9th American Peptide Symposium) Pierce Chemical Co. Rockland, IL, 1985), 

25 b-turn dipeptide cores (Nagai et al. (1985) Tetrahedron Lett 26:647; and Sato et al. 
(1986) J Chem Soc Perkin Trans 1:1231), and b-aminoalcohols (Gordon et al. (1985) 
Biochem Biophys Res Commun 126:419; and et al. (1986) Biochem Biophys Res 
Commun 134:71). 

30 Vaccine Formulations for C albicans Nucleic Acids and Polypeptides 

This invention also features vaccine compositions for protection against 
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infection by C. albicans or for treatment of C. albicans infection. In one embodiment, 
the vaccine compositions contain one or more immunogenic components such as a 
surface protein from C. albicans, or portion thereof, and a pharmaceutical^ 
acceptable carrier. Nucleic acids within the scope of the invention are exemplified by 
5 the nucleic acids of the invention contained in the Sequence Listing which encode C. 
albicans surface proteins. Any nucleic acid encoding an immunogenic C. albicans 
protein, or portion thereof, which is capable of expression in a cell, can be used in the 
present invention. These vaccines have therapeutic and prophylactic utilities. 

One aspect of the invention provides a vaccine composition for protection 
10 against infection by C. albicans which contains at least one immunogenic fragment of 
a C. albicans protein and a pharmaceutical ly acceptable carrier. Preferred fragments 
include peptides of at least about 10 amino acid residues in length, preferably about 
10-20 amino acid residues in length, and more preferably about 12-16 amino acid 
residues in length. 

1 5 Immunogenic components of the invention can be obtained, for example, by 

screening polypeptides recombinantly produced from the corresponding fragment of 
the nucleic acid encoding the full-length C. albicans protein. In addition, fragments 
can be chemically synthesized using techniques known in the art such as conventional 
Merrifield solid phase f-Moc or t-Boc chemistry. 

20 In one embodiment, immunogenic components are identified by the ability of 

the peptide to stimulate T cells. Peptides which stimulate T cells, as determined by, 
for example, T cell proliferation or cytokine secretion are defined herein as 
comprising at least one T cell epitope. T cell epitopes are believed to be involved in 
initiation and perpetuation of the immune response to the protein allergen which is 

25 responsible for the clinical symptoms of allergy. These T cell epitopes are thought to 
trigger early events at the level of the T helper cell by binding to an appropriate HLA 
molecule on the surface of an antigen presenting cell, thereby stimulating the T cell 
subpopulation with the relevant T cell receptor for the epitope. These events lead to T 
cell proliferation, lymphokine secretion, local inflammatory reactions, recruitment of 

30 additional immune cells to the site of antigen/T cell interaction, and activation of the 
B cell cascade, leading to the production of antibodies. A T cell epitope is the basic 
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element, or smallest unit of recognition by a T cell receptor, where the epitope 
comprises amino acids essential to receptor recognition (e.g., approximately 6 or 7 
amino acid residues). Amino acid sequences which mimic those of the T cell epitopes 
are within the scope of this invention. 
5 Screening immunogenic components can be accomplished using one or more 

of several different assays. For example, in vitro, peptide T cell stimulatory activity is 
assayed by contacting a peptide known or suspected of being immunogenic with an 
antigen presenting cell which presents appropriate MHC molecules in a T cell culture. 
Presentation of an immunogenic C. albicans peptide in association with appropriate 

10 MHC molecules to T cells in conjunction with the necessary co-stimulation has the 
effect of transmitting a signal to the T cell that induces the production of increased 
levels of cytokines, particularly of interleukin-2 and interleukin-4. The culture 
supernatant can be obtained and assayed for interleukin-2 or other known cytokines. 
For example, any one of several conventional assays for interleukin-2 can be 

15 employed, such as the assay described in Proc. Natl Acad. Sci USA, 86: 1333 (1989) 
the pertinent portions of which are incorporated herein by reference. A kit for an 
assay for the production of interferon is also available from Genzyme Corporation 
(Cambridge, MA). 

Alternatively, a common assay for T cell proliferation entails measuring 

20 tritiated thymidine incorporation. The proliferation of T cells can be measured in 
vitro by determining the amount of 3 H-labeIed thymidine incorporated into the 
replicating DNA of cultured cells. Therefore, the rate of DNA synthesis and, in turn, 
the rate of cell division can be quantified. 

Vaccine compositions of the invention containing immunogenic components 

25 (e.g., C. albicans polypeptide or fragment thereof or nucleic acid encoding a C. 
albicans polypeptide or fragment thereof) preferably include a pharmaceutical^ 
acceptable carrier. The term "pharmaceutically acceptable carrier" refers to a carrier 
that does not cause an allergic reaction or other untoward effect in patients to whom it 
is administered. Suitable pharmaceutically acceptable carriers include, for example, 

30 one or more of water, saline, phosphate buffered saline, dextrose, glycerol, ethanol 
and the like, as well as combinations thereof. Pharmaceutically acceptable carriers 
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may further comprise minor amounts of auxiliary substances such as wetting or 
emulsifying agents, preservatives or buffers, which enhance the shelf life or 
effectiveness of the antibody. For vaccines of the invention containing C. albicans 
polypeptides, the polypeptide is co-administered with a suitable adjuvant. 
5 It will be apparent to those of skill in the art that the therapeutically effective 

amount of DNA or protein of this invention will depend, inter alia, upon the 
administration schedule, the unit dose of antibody administered, whether the protein 
or DNA is administered in combination with other therapeutic agents, the immune 
status and health of the patient, and the therapeutic activity of the particular protein or 
10 DNA. 

Vaccine compositions are conventionally administered parenterally, e.g., by 
injection, either subcutaneously or intramuscularly. Methods for intramuscular 
immunization are described by Wolff et al. (1990) Science 247: 1465-1468 and by 
Sedegah et al. (1994) Immunology 91_: 9866-9870. Other modes of administration 

15 include oral and pulmonary formulations, suppositories, and transdermal applications. 
Oral immunization is preferred over parenteral methods for inducing protection 
against infection by C. albicans. Cain et. al. (1993) Vaccine 11: 637-642. Oral 
formulations include such normally employed excipients as, for example, 
pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium 

20 saccharine, cellulose, magnesium carbonate, and the like. 

The vaccine compositions of the invention can include an adjuvant, including, 
but not limited to aluminum hydroxide; N-acetyl-muramyl~L-threonyl-D- 
isoglutamine (thr-MDP); N-acetyl-nor-muramyl-L-alanyl-D-isoglutamine (CGP 
1 1637, referred to as nor-MDP); N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L- 

25 alanine-2-(r-2 , -dipalmitoyI-sn-glycero-3-hydroxyphos-phoryloxy)-ethylamine (CGP 
19835A, referred to a MTP-PE); RIBI, which contains three components from 
bacteria; monophosphoryl lipid A; trehalose dimycoloate; cell wall skeleton (MPL + 
TDM + CWS) in a 2% squalene/Tween 80 emulsion; and cholera toxin. Others which 
may be used are non-toxic derivatives of cholera toxin, including its B subunit, and/or 

30 conjugates or genetically engineered fusions of the C. albicans polypeptide with 
cholera toxin or its B subunit, procholeragenoid, fungal polysaccharides, including 
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schizophyllan, muramyl dipeptide, muramyl dipeptide derivatives, phorbol esters, 
labile toxin of E. coli, non-C. albicans fungal lysates, block polymers or saponins. 

Other suitable delivery methods include biodegradable microcapsules or 
immuno-stimulating complexes (ISCOMs), cochleates, or liposomes, genetically 
5 engineered attenuated live vectors such as viruses or bacteria, and recombinant 
(chimeric) virus-like particles, e.g., bluetongue. The amount of adjuvant employed 
will depend on the type of adjuvant used. For example, when the mucosal adjuvant is 
cholera toxin, it is suitably used in an amount of 5 mg to 50 mg, for example 10 mg to 
35 mg. When used in the form of microcapsules, the amount used will depend on the 
10 amount employed in the matrix of the microcapsule to achieve the desired dosage. 

The determination of this amount is within the skill of a person of ordinary skill in the 
art. 

Carrier systems in humans may include enteric release capsules protecting the 
antigen from the acidic environment of the stomach, and including C. albicans 

15 polypeptide in an insoluble form as fusion proteins. Suitable carriers for the vaccines 
of the invention are enteric coated capsules and polylactide-glycolide microspheres. 
Suitable diluents are 0.2 N NaHCCb and/or saline. 

Vaccines of the invention can be administered as a primary prophylactic agent 
in adults or in children, as a secondary prevention, after successful eradication of C. 

20 albicans in an infected host, or as a therapeutic agent in the aim to induce an immune 
response in a susceptible host to prevent infection by C. albicans. The vaccines of the 
invention are administered in amounts readily determined by persons of ordinary skill 
in the art. Thus, for adults a suitable dosage will be in the range of 10 mg to 10 g, 
preferably 10 mg to 100 mg. A suitable dosage for adults will also be in the range of 

25 5 mg to 500 mg. Similar dosage ranges will be applicable for children. Those skilled 
in the art will recognize that the optimal dose may be more or less depending upon the 
patient's body weight, disease, the route of administration, and other factors. Those 
skilled in the art will also recognize that appropriate dosage levels can be obtained 
based on results with known oral vaccines such as, for example, a vaccine based on an 

30 E. coli lysate (6 mg dose daily up to total of 540 mg) and with an enterotoxigenic E. 
coli purified antigen (4 doses of 1 mg) (Schulman et al., J. Urol 150:917-921 (1993); 
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Boedecker et al., American Gastroenterological Assoc. 999:A-222 (1993)). The 
number of doses will depend upon the disease, the formulation, and efficacy data 
from clinical trials. Without intending any limitation as to the course of treatment, the 
treatment can be administered over 3 to 8 doses for a primary immunization schedule 
5 over 1 month (Boedeker, American Gastroenterological Assoc, 888:A-222 (1993)). 

In a preferred embodiment, a vaccine composition of the invention can be 
based on a killed whole E. coli preparation with an immunogenic fragment of a C 
albicans protein of the invention expressed on its surface or it can be based on an E. 
coli lysate, wherein the killed E. coli acts as a carrier or an adjuvant. 

1 0 It will be apparent to those skilled in the art that some of the vaccine 

compositions of the invention are useful only for preventing C. albicans infection, 
some are useful only for treating C. albicans infection, and some are useful for both 
preventing and treating C albicans infection. In a preferred embodiment, the vaccine 
composition of the invention provides protection against C. albicans infection by 

1 5 stimulating humoral and/or cell-mediated immunity against C. albicans. It should be 
understood that amelioration of any of the symptoms of C. albicans infection is a 
desirable clinical goal, including a lessening of the dosage of medication used to treat 
C. albicans-caused disease, or an increase in the production of antibodies in the serum 
or mucous of patients. 

20 

Antibodies Reactive With C. albicans Polypeptides 

The invention also includes antibodies specifically reactive with the subject C. 
albicans polypeptide. Anti-protein/anti-peptide antisera or monoclonal antibodies can 
be made by standard protocols (See, for example, Antibodies: A Laboratory Manual 

25 ed. by Harlow and Lane (Cold Spring Harbor Press: 1988)). A mammal such as a 
mouse, a hamster or rabbit can be immunized with an immunogenic form of the 
peptide. Techniques for conferring immunogenicity on a protein or peptide include 
conjugation to carriers or other techniques well known in the art. An immunogenic 
portion of the subject C. albicans polypeptide can be administered in the presence of 

30 adjuvant. The progress of immunization can be monitored by detection of antibody 
titers in plasma or serum. Standard ELISA or other immunoassays can be used with 
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the immunogen as antigen to assess the levels of antibodies. 

In a preferred embodiment, the subject antibodies are immunospecific for 
antigenic determinants of the C. albicans polypeptides of the invention, e.g. antigenic 
determinants of a polypeptide of the invention contained in the Sequence Listing, or a 
5 closely related human or non-human mammalian homolog (e.g., 90% homologous, 
more preferably at least about 95% homologous). In yet a further preferred 
embodiment of the invention, the anti-C albicans antibodies do not substantially 
cross react (i.e., react specifically) with a protein which is for example, less than 80% 
percent homologous to a sequence of the invention contained in the Sequence 

10 Listing. By "not substantially cross react", it is meant that the antibody has a binding 
affinity for a non-homologous protein which is less than 10 percent, more preferably 
less than 5 percent, and even more preferably less than 1 percent, of the binding 
affinity for a protein of the invention contained in the Sequence Listing. In a most 
preferred embodiment, there is no cross-reactivity between fungal and mammalian 

15 antigens. 

The term antibody as used herein is intended to include fragments thereof 
which are also specifically reactive with C. albicans polypeptides. Antibodies can be 
fragmented using conventional techniques and the fragments screened for utility in 
the same manner as described above for whole antibodies. For example, F(ab') 2 

20 fragments can be generated by treating antibody with pepsin. The resulting F(ab') 2 
fragment can be treated to reduce disulfide bridges to produce Fab' fragments. The 
antibody of the invention is further intended to include bispecific and chimeric 
molecules having an anti-C. albicans portion. 

Both monoclonal and polyclonal antibodies (Ab) directed against C. albicans 

25 polypeptides or C. albicans polypeptide variants, and antibody fragments such as Fab' 
and F(ab') 2 , can be used to block the action of C albicans polypeptide and allow the 
study of the role of a particular C. albicans polypeptide of the invention in aberrant or 
unwanted intracellular signaling, as well as the normal cellular function of the C. 
albicans and by microinjection of anti-C albicans polypeptide antibodies of the 

30 present invention. 

Antibodies which specifically bind C. albicans epitopes can also be used in 
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immunohistochemical staining of tissue samples in order to evaluate the abundance 
and pattern of expression of C. albicans antigens. Anti-C. albicans polypeptide 
antibodies can be used diagnostically in immuno-precipitation and immuno-blotting 
to detect and evaluate C albicans levels in tissue or bodily fluid as part of a clinical 
5 testing procedure. Likewise, the ability to monitor C. albicans polypeptide levels in 
an individual can allow determination of the efficacy of a given treatment regimen for 
an individual afflicted with such a disorder. The level of a C albicans polypeptide 
can be measured in cells found in bodily fluid, such as in urine samples or can be 
measured in tissue, such as produced by gastric biopsy. Diagnostic assays using anti- 

10 C. albicans antibodies can include, for example, immunoassays designed to aid in 
early diagnosis of C albicans infections. The present invention can also be used as a 
method of detecting antibodies contained in samples from individuals infected by this 
bacterium using specific C. albicans antigens. 

Another application of anti-C albicans polypeptide antibodies of the invention 

15 is in the immunological screening of cDNA libraries constructed in expression vectors 
such as Agtl 1, Agtl 8-23, AZAP, and ^ORF8. Messenger libraries of this type, having 
coding sequences inserted in the correct reading frame and orientation, can produce 
fusion proteins. For instance, X.gtl 1 will produce fusion proteins whose amino termini 
consist of B-galactosidase amino acid sequences and whose carboxy termini consist of 

20 a foreign polypeptide. Antigenic epitopes of a subject C. albicans polypeptide can 
then be detected with antibodies, as, for example, reacting nitrocellulose filters lifted 
from infected plates with anti-C. albicans polypeptide antibodies. Phage, scored by 
this assay, can then be isolated from the infected plate. Thus, the presence of C. 
albicans gene homologs can be detected and cloned from other species, and alternate 

25 isoforms (including splicing variants) can be detected and cloned. 
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Bio chip Technology 

The nucleic acid sequences or fragments thereof of the present invention lend 
themselves to the detection of nucleic acid sequences or fragments thereof of C. 
albicans or other species of Candida using nanotechnology apparatus, compositions 
5 and methods, referred to generically herein as "bio chip" technology. Bio chips 

containing arrays of nucleic acid sequence can also be used to measure expression of 
genes of C. albicans or other species of Candida. For example, to diagnose a patient 
with a C. albicans or other Candida infection, a sample from a human or animal can 
be used as a probe on a bio chip containing an array of nucleic acid sequence from the 

10 present invention. In addition, a sample from a disease state can be compared to a 
sample from a non-disease state which would help identify a gene that is up-regulated 
or expressed in the disease state. This would provide valuable insight as to the 
mechanism by which the disease manifests. Changes in gene expression can also be 
used to identify critical pathways involved in drug transport or metabolism, and may 

1 5 enable the identification of novel targets involved in virulence or host cell interactions 
involved in maintenance of an infection. Procedures using such techniques have been 
described by Brown et al 9 1995, Science 270: 467-470. 

Bio chip technology can also be used to monitor the genetic changes of 
potential therapeutic compounds including, deletions, insertions or mismatches. Once 

20 the therapeutic is added to the patient, changes to the genetic sequence can be 
evaluated for its efficacy. In addition, the nucleic acid sequence of the present 
invention can be used to determine essential genes in cell cycling. As described in 
Iyer et al., 1999 (Science, 283:83-87) genes essential in the cell cycle can be 
identified using bio chips. Furthermore, the present invention provides nucleic acid 

25 sequences which can be used with bio chip technology to understand regulatory 
networks in bacteria, measure the response to environmental signals or drugs as in 
drug screening, and study virulence induction. (Mons et al, 1998, Nature 
Biotechnology, 16: 45-48). Patents teaching this technology include U.S. Patents 
5445934, 5744305, and 5800992. 



30 
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Kits Containing Nucleic Acids, Polypeptides or Antibodies of the Invention 
The nucleic acid, polypeptides and antibodies of the invention can be 
conveniently combined with other reagents and articles to form kits. Kits for 
diagnostic purposes typically comprise the nucleic acid, polypeptides or antibodies in 
5 vials or other suitable vessels. Kits typically comprise other reagents for performing 
hybridization reactions, polymerase chain reactions (PCR), or for reconstitution of 
lyophilized components, such as aqueous media, salts, buffers, and the like. Kits may 
also comprise reagents for sample processing such as detergents, chaotropic salts and 
the like. Kits may also comprise immobilization means such as particles, supports, 

10 wells, dipsticks and the like. Kits may also comprise labeling means such as dyes, 
developing reagents, radioisotopes, fluorescent agents, luminescent or 
chemiluminescent agents, enzymes, intercalating agents and the like. With the nucleic 
acid and amino acid sequence information provided herein, individuals skilled in art 
can readily assemble kits to serve their particular purpose. Kits further can include 

1 5 instructions for use. 

Drug Screening Assays Using C. albicans Polypeptides 

By making available purified and recombinant C. albicans polypeptides, the 
present invention provides assays which can be used to screen for drugs which are 

20 either agonists or antagonists of the normal cellular function, in this case, of the 
subject C. albicans polypeptides, or of their role in intracellular signaling. Such 
inhibitors or potentiators may be useful as new therapeutic agents to combat C 
albicans infections in humans. A variety of assay formats will suffice and, in light of 
the present inventions, will be comprehended by the person skilled in the art. 

25 In many drug screening programs which test libraries of compounds and 

natural extracts, high throughput assays are desirable in order to maximize the number 
of compounds surveyed in a given period of time. Assays which are performed in 
cell-free systems, such as may be derived with purified or semi-purified proteins, are 
often preferred as "primary 11 screens in that they can be generated to permit rapid 

30 development and relatively easy detection of an alteration in a molecular target which 
is mediated by a test compound. Moreover, the effects of cellular toxicity and/or 
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bioavailability of the test compound can be generally ignored in the in vitro system, 
the assay instead being focused primarily on the effect of the drug on the molecular 
target as may be manifest in an alteration of binding affinity with other proteins or 
change in enzymatic properties of the molecular target. Accordingly, in an exemplary 
5 screening assay of the present invention, the compound of interest is contacted with 
an isolated and purified C. albicans polypeptide. 

Screening assays can be constructed in vitro with a purified C. albicans 
polypeptide or fragment thereof, such as a C albicans polypeptide having enzymatic 
activity, such that the activity of the polypeptide produces a detectable reaction 

10 product. The efficacy of the compound can be assessed by generating dose response 
curves from data obtained using various concentrations of the test compound. 
Moreover, a control assay can also be performed to provide a baseline for 
comparison. Suitable products include those with distinctive absorption, 
fluorescence, or chemi-luminescence properties, for example, because detection may 

15 be easily automated. A variety of synthetic or naturally occurring compounds can be 
tested in the assay to identify those which inhibit or potentiate the activity of the C. 
albicans polypeptide. Some of these active compounds may directly, or with 
chemical alterations to promote membrane permeability or solubility, also inhibit or 
potentiate the same activity (e.g., enzymatic activity) in whole, live C. albicans cells. 

20 

Overexpression Assays 

Overexpression assays are based on the premise that overproduction of a 

protein would lead to a higher level of resistance to compounds that selectively 

interfere with the function of that protein. Overexpression assays may be used to 
25 identify compounds that interfere with the function of virtually any type of protein, 

including without limitation enzymes, receptors, DNA- or RNA-binding proteins, or 

any proteins that are directly or indirectly involved in regulating cell growth. 

Typically, two fimgal strains are constructed. One contains a single copy of 

the gene of interest, and a second contains several copies of the same gene. 
30 Identification of useful inhibitory compounds of this type of assay is based on a 

comparison of the activity of a test compound in inhibiting growth and/or viability of 
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the two strains. The method involves constructing a nucleic acid vector that directs 
high level expression of a particular target nucleic acid. The vectors are then 
transformed into host cells in single or multiple copies to produce strains that express 
low to moderate and high levels of protein encoding by the target sequence (strain A 
5 and B, respectively). Nucleic acid comprising sequences encoding the target gene 
can, of course, be directly integrated into the host cell. 

Large numbers of compounds (or crude substances which may contain active 
compounds) are screened for their effect on the growth of the two strains. Agents 
which interfere with an unrelated target equally inhibit the growth of both strains. 

10 Agents which interfere with the function of the target at high concentration should 
inhibit the growth of both strains. It should be possible, however, to titrate out the 
inhibitory effect of the compound in the overexpressing strain. That is, if the 
compound is affecting the particular target that is being tested, it should be possible to 
inhibit the growth of strain A at a concentration of the compound that allows strain B 

1 5 to grow. 

Alternatively, a fungal strain is constructed that contains the gene of interest 
under the control of an inducible promoter. Identification of useful inhibitory agents 
using this type of assay is based on a comparison of the activity of a test compound in 
inhibiting growth and/or viability of this strain under both inducing and non-inducing 

20 conditions. The method involves constructing a nucleic acid vector that directs high- 
level expression of a particular target nucleic acid. The vector is then transformed 
into host cells that are grown under both non-inducing and inducing conditions 
(conditions A and B, respectively). 

Large numbers of compounds (or crude substances which may contain active 

25 compounds) are screened for their effect on growth under these two conditions. 

Agents that interfere with the function of the target should inhibit growth under both 
conditions. It should be possible, however, to titrate out the inhibitory effect of the 
compound in the overexpressing strain. That is, if the compound is affecting the 
particular target that is being tested, it should be possible to inhibit growth under 

30 condition A at a concentration that allows the strain to grow under condition B. 
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Ligand-binding Assays 

Many of the targets according to the invention have functions that have not yet 
been identified. Ligand-binding assays are useful to identify inhibitor compounds that 
interfere with the function of a particular target, even when that function is unknown. 
5 These assays are designed to detect binding of test compounds to particular targets. 
The detection may involve direct measurement of binding. Alternatively, indirect 
indications of binding may involve stabilization of protein structure or disruption of a 
biological function. Non-limiting examples of useful ligand-binding assays are 
detailed below. 

10 A useful method for the detection and isolation of binding proteins is the 

Biomolecular Interaction Assay (BIAcore) system developed by Pharmacia Biosensor 
and described in the manufacturer's protocol (LKB Pharmacia, Sweden). The 
BIAcore system uses an affinity purified anti-GST antibody to immobilize GST- 
fusion proteins onto a sensor chip. The sensor utilizes surface plasmon resonance 

15 which is an optical phenomenon that detects changes in refractive indices. In 

accordance with the practice of the invention, a protein of interest is coated onto a 
chip and test compounds are passed over the chip. Binding is detected by a change in 
the refractive index (surface plasmon resonance). 

A different type of ligand-binding assay involves scintillation proximity 

20 assays (SPA, described in U.S. Patent No. 4,568,649). 

Another type of ligand binding assay, also undergoing development, is based 
on the fact that proteins containing mitochondrial targeting signals are imported into 
isolated mitochondria in vitro (Hurt et ah, 1985, Embo J. 4:2061-2068; Eilers and 
Schatz, Nature, 1986, 322:228-231). In a mitochondrial import assay, expression 

25 vectors are constructed in which nucleic acids encoding particular target proteins are 
inserted downstream of sequences encoding mitochondrial import signals. The 
chimeric proteins are synthesized and tested for their ability to be imported into 
isolated mitochondria in the absence and presence of test compounds. A test 
compound that binds to the target protein should inhibit its uptake into isolated 

30 mitochondria in vitro. 
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Another ligand-binding assay is the yeast two-hybrid system (Fields and Song, 
1989, Nature 340:245-246). The yeast two-hybrid system takes advantage of the 
properties of the GAL4 protein of the yeast Saccharomyces cerevisiae. The GAL4 
protein is a transcriptional activator required for the expression of genes encoding 
5 enzymes of galactose utilization. This protein consists of two separable and 

functionally essential domains: an N-terminal domain which binds to specific DNA 
sequences (UAS G ); and a C-terminal domain containing acidic regions, which is 
necessary to activate transcription. The native GAL4 protein, containing both 
domains, is a potent activator of transcription when yeast are grown on galactose 
10 media. The N-terminal domain binds to DNA in a sequence-specific manner but is 
unable to activate transcription. The C-terminal domain contains the activating 
regions but cannot activate transcription because it fails to be localized to UASg. In 
the two-hybrid system, a system of two hybrid proteins containing parts of GAL4: (1) 
a GAL4 DNA-binding domain fused to a protein 'X' and (2) a GAL4 activation region 
15 fused to a protein 'Y\ If X and Y can form a protein-protein complex and reconstitute 
proximity of the GAL4 domains, transcription of a gene regulated by UASg occurs. 
Creation of two hybrid proteins, each containing one of the interacting proteins X and 
Y, allows the activation region of UASg to be brought to its normal site of action. 

The binding assay described in Fodor et al, 1991, Science 251:767-773, 
20 which involves testing the binding affinity of test compounds for a plurality of 
defined polymers synthesized on a solid substrate, may also be useful. 

Compounds which bind to the polypeptides of the invention are potentially 
useful as antifungal agents for use in therapeutic compositions. 

Pharmaceutical formulations suitable for antifungal therapy comprise the 
25 antifungal agent in conjunction with one or more biologically acceptable carriers. 
Suitable biologically acceptable carriers include, but are not limited to, phosphate- 
buffered saline, saline, deionized water, or the like. Preferred biologically acceptable 
carriers are physiologically or pharmaceutical^ acceptable carriers. 

The antifungal compositions include an antifungal effective amount of active 
30 agent. Antifungal effective amounts are those quantities of the antifungal agents of 
the present invention that afford prophylactic protection against fungal infections or 
which result in amelioration or cure of an existing fungal infection. This antifungal 
effective amount will depend upon the agent, the location and nature of the infection, 
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and the particular host. The amount can be determined by experimentation known in 
the art, such as by establishing a matrix of dosages and frequencies and comparing a 
group of experimental units or subjects to each point in the matrix. 

The antifungal active agents or compositions can be formed into dosage unit 
5 forms, such as for example, creams, ointments, lotions, powders, liquids, tablets, 
capsules, suppositories, sprays, aerosols or the like. If the antifungal composition is 
formulated into a dosage unit form, the dosage unit form may contain an antifungal 
effective amount of active agent. Alternatively, the dosage unit form may include less 
than such an amount if multiple dosage unit forms or multiple dosages are to be used 

10 to administer a total dosage of the active agent. Dosage unit forms can include, in 
addition, one or more excipient(s), diluent(s), disintegrant(s), lubricant(s), 
plasticizer(s), colorant(s), dosage vehicle(s), absorption enhancer(s), stabilizer(s), 
bactericide(s), or the like. 

For general information concerning formulations, see, e.g., Gilman et al. 

1 5 (eds.), 1 990, Goodman and Gilman 's: The Pharmacological Basis of Therapeutics, 
8th ed., Pergamon Press; and Remington's Pharmaceutical Sciences, 17th ed., 1990, 
Mack Publishing Co., Easton, PA; Avis et al. (eds.), 1993, Pharmaceutical Dosage 
Forms: Parenteral Medications, Dekker, New York; Lieberman et al (eds.), 1990, 
Pharmaceutical Dosage Forms: Disperse Systems, Dekker, New York. 

20 The antifungal agents and compositions of the present invention are useful for 

preventing or treating C. albicans infections. Infection prevention methods 
incorporate a prophylactically effective amount of an antifungal agent or composition. 
A prophylactically effective amount is an amount effective to prevent C. albicans 
infection and will depend upon the specific fungal strain, the agent, and the host. 

25 These amounts can be determined experimentally by methods known in the art and as 
described above. 

G albicans infection treatment methods incorporate a therapeutically effective 
amount of an antifungal agent or composition. A therapeutically effective amount is 
an amount sufficient to ameliorate or eliminate the infection. The prophylactically 
30 and/or therapeutically effective amounts can be administered in one administration or 
over repeated administrations. Therapeutic administration can be followed by 
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prophylactic administration, once the initial fungal infection has been resolved. 

The antifungal agents and compositions can be administered topically or 
systemically. Topical application is typically achieved by administration of creams, 
ointments, lotions, or sprays as described above. Systemic administration includes 
5 both oral and parental routes. Parental routes include, without limitation, 

subcutaneous, intramuscular, intraperitoneal, intravenous, transdermal, inhalation and 
intranasal administration. 

EXEMPLIFICATION 

10 Cloning and Sequencing C albicans Genomic Sequence 

This invention provides nucleotide sequences of the genome of C. albicans 
which thus comprises a DNA sequence library of C. albicans genomic DNA. The 
detailed description that follows provides nucleotide sequences of C. albicans, and 
also describes how the sequences were obtained and how ORFs (Open Reading 

15 Frames) and protein-coding sequences can be identified. Also described are methods 
of using the disclosed C. albicans sequences in methods including diagnostic and 
therapeutic applications. Furthermore, the library can be used as a database for 
identification and comparison of medically important sequences in this and other 
strains of C albicans as well as other species of Candida. 

20 Chromosomal DNA from strain SC5314 of C. albicans was isolated after 

Zymolyase digestion, sodium dodecyl sulfate lysis, potassium acetate precipitation, 
phenol:chloroform extraction and ethanol precipitation (Soil, D.R., T. Srikantha and 
S.R. Lockhart: Characterizing Developmentally Regulated Genes in C. albicans. In 
Microbial Genome Methods. K.W. Adolph, editor. CRC Press. New York, p 17-37.). 

25 Genomic C. albicans DNA was hydrodynamically sheared in an HPLC and then 

separated on a standard 1% agarose gel. Fractions corresponding to 2500-3000 bp in 
length were excised from the gel and purifed by the GeneClean procedure (BiolOl, 
Inc.). 

The purified DNA fragments were then blunt-ended using T4 DNA 
30 polymerase. The healed DNA was then ligated to unique ifo/XI-linker adapters (5'- 
GTCTTC ACC ACGGGG-3 ' and 5 ' -GTGGTG AAG AC-3 ' in 100-1000 fold molar 
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excess). These linkers are complimentary to the ZtoXI-cut pGTC vector, while the 
overhang is not self-complimentary. Therefore, the linkers will not concatermerize 
nor will the cut-vector religate itself easily. The linker-adapted inserts were separated 
from the unincorporated linkers on a 1% agarose gel and purified using GeneClean. 
5 The linker-adapted inserts were then ligated to As/XI-cut vector to construct a 
"shotgun" sublclone libraries. 

Only major modifications to the protocols are highlighted. Briefly, the library 
was then transformed into DH5a competent cells (Gibco/BRL, DH5a transformation 
protocol). It was assessed by plating onto antibiotic plates containing ampicillin and 

10 IPTG/Xgal. The plates were incubated overnight at 37DC. Transformants were then 
used for plating of clones and picking for sequencing. The cultures were grown 
overnight at 37DC- DNA was purified using a silica bead DNA preparation 
(Engelstein, 1996) method. In this manner, 25 jug of DNA was obtained per clone. 
These purified DNA samples were then sequenced using primarily ABI dye- 

15 terminator chemistry. All subsequent steps were based on sequencing by ABI377 
automated DNA sequencing methods. The ABI dye terminator sequence reads were 
run on ABI377 machines and the data was transferred to UNIX machines following 
lane tracking of the gels. Base calls and quality scores were determined using the 
program PHRED (Ewing et al., 1998, Genome Res. 8: 175-185; Ewing and Green, 

20 1998, Genome Res. 8: 685-734). Reads were assembled using PHRAP (P. Green, 
Abstracts of DOE Human Genome Program Contractor-Grantee Workshop V, Jan. 
1996, p. 157) with default program parameters and quality scores. The initial 
assembly was done at 2.3-fold coverage and yielded 5821 contigs. 

Finishing could follow the initial assembly. Missing mates (sequences from 

25 clones that only gave reads from one end of the Candida DNA inserted in the 
plasmid) could be identified and sequenced with ABI technology to allow the 
identification of additional overlapping contigs. 

End-sequencing of randomly picked genomic lambda was also performed. 
Sequencing on a both sides was done for all lambda sequences. The lambda library 

30 backbone helped to verify the integrity of the assembly and allowed closure of some 
of the physical gaps. Primers for walking off the ends of contigs would be selected 
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using pick_primer (a GTC program) near the ends of the clones to facilitate gap 
closure. These walks could be sequenced using the selected clones and primers. 
These data are then reassembled with PHRAP. Additional sequencing using PCR- 
generated templates and screened and/or unscreened lambda templates could be done 
5 in addition. 

To identify C albicans polypeptides the complete genomic sequence of C. 
albicans was analyzed essentially as follows: First, all possible stop-to-stop open 
reading frames (ORFs) greater than 180 nucleotides in all six reading frames were 
translated into amino acid sequences. Second, the identified ORFs were analyzed for 
10 homology to known (archeabacter, prokaryotic and eukaryotic) protein sequences. 
Third, the coding potential of non-homologous sequences was evaluated with the 
program GENEMARKTM (Borodovsky and Mclninch, 1993, Comp. Chem. 17:123). 

Identification. Cloning and Expression of C. albicans Nucleic Acids 
15 Expression and purification of the C. albicans polypeptides of the invention 

can be performed essentially as outlined below. 

To facilitate the cloning, expression and purification of membrane and 

secreted proteins from C. albicans, a gene expression system, such as the pET System 

(Novagen), for cloning and expression of recombinant proteins in E. coli, is selected. 
20 Also, a DNA sequence encoding a peptide tag, the His-Tag, is fused to the 3' end of 

DNA sequences of interest in order to facilitate purification of the recombinant 

protein products. The 3' end is selected for fusion in order to avoid alteration of any 

5' terminal signal sequence. 

25 PCR Amplification and Cloning of Nucleic Acids Containing ORF's Encoding 
Enzymes 

Nucleic acids chosen (for example, from the nucleic acids set forth in SEQ ID 
NO: 1 - SEQ ID NO: 14103) for cloning from strain SC5314 of C. albicans are 
prepared for amplification cloning by polymerase chain reaction (PCR). Synthetic 
30 oligonucleotide primers specific for the 5 7 and 3 7 ends of open reading frames 
(ORFs) are designed and purchased from GibcoBRL Life Technologies 
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(Gaithersburg, MD, USA). All forward primers (specific for the 5 1 end of the 
sequence) are designed to include an Ncol cloning site at the extreme 5 ; terminus. 
These primers are designed to permit initiation of protein translation at a methionine 
residue followed by a valine residue and the coding sequence for the remainder of the 
5 native C. albicans DNA sequence. All reverse primers (specific for the 3' end of any 
C. albicans ORF) include a EcoRI site at the extreme 5 7 terminus to permit cloning of 
each C. albicans sequence into the reading frame of the pET-28b. The pET-28b 
vector provides sequence encoding an additional 20 carboxy-terminal amino acids 
including six histidine residues (at the extreme C-terminus), which comprise the His- 
10 Tag. 

Genomic DNA prepared from strain SC53 14 of C. albicans is used as the 
source of template DNA for PCR amplification reactions (Current Protocols in 
Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). To 
amplify a DNA sequence containing an C. albicans ORF, genomic DNA (50 

15 nanograms) is introduced into a reaction vial containing 2 mM MgCl 2 , 1 micromolar 
synthetic oligonucleotide primers (forward and reverse primers) complementary to 
and flanking a defined C. albicans ORF, 0.2 mM of each deoxynucleotide 
triphosphate; dATP, dGTP, dCTP, dTTP and 2.5 units of heat stable DNA polymerase 
(Amplitaq, Roche Molecular Systems, Inc., Branchburg, NJ, USA) in a final volume 

20 of 1 00 microliters. 

Upon completion of thermal cycling reactions, each sample of amplified DNA 
is washed and purified using the Qiaquick Spin PCR purification kit (Qiagen, 
Gaithersburg, MD, USA). All amplified DNA samples are subjected to digestion 
with the restriction endonucleases, e.g., Ncol and EcoRI (New England BioLabs, 

25 Beverly, MA, USA)(Current Protocols in Molecular Biology, John Wiley and Sons, 
Inc., F. Ausubel et al., eds., 1994). DNA samples are then subjected to 
electrophoresis on 1.0 % NuSeive (FMC BioProducts, Rockland, ME USA) agarose 
gels. DNA is visualized by exposure to ethidium bromide and long wave uv 
irradiation. DNA contained in slices isolated from the agarose gel is purified using 

30 the Bio 101 GeneClean Kit protocol (Bio 101 Vista, CA, USA). 
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Cloning of C albicans Nucleic Acids Into an Expression Vector 

The pET-28b vector is prepared for cloning by digestion with restriction 

endonucleases, e.g., Ncol and EcoRI (Current Protocols in Molecular Biology, John 

Wiley and Sons, Inc., F. Ausubel et aL, eds., 1994). The pET-28a vector, which 
5 encodes a His-Tag that can be fused to the 5 f end of an inserted gene, is prepared by 

digestion with appropriate restriction endonucleases. 

Following digestion, DNA inserts are cloned (Current Protocols in Molecular 

Biology, John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) into the previously 

digested pET-28b expression vector. Products of the ligation reaction are then used to 
10 transform the BL21 strain of E. coli (Current Protocols in Molecular Biology, John 

Wiley and Sons, Inc., F. Ausubel et al., eds., 1994) as described below. 

Transformation Of Competent Bacteria With Recombinant Plasmids 

Competent bacteria, E coli strain BL21 or E. coli strain BL21(DE3), are 

15 transformed with recombinant pET expression plasmids carrying the cloned C. 

albicans sequences according to standard methods (Current Protocols in Molecular, 
John Wiley and Sons, Inc., F. Ausubel et al., eds., 1994). Briefly, 1 microliter of 
ligation reaction is mixed with 50 microliters of electrocompetent cells and subjected 
to a high voltage pulse, after which, samples are incubated in 0.45 milliliters SOC 

20 medium (0.5% yeast extract, 2.0 % tryptone, 10 mM NaCl, 2.5 mM KC1, 10 mM 
MgC12, 10 mM MgS04 and 20, mM glucose) at 37DC with shaking for 1 hour. 
Samples are then spread on LB agar plates containing 25 microgram/ml kanamycin 
sulfate for growth overnight. Transformed colonies of BL21 are then picked and 
analyzed to evaluate cloned inserts as described below. 

25 

Identification Of Recombinant Expression Vectors With C albicans Nucleic Acids 
Individual BL21 clones transformed with recombinant pET-28b C. albicans 
ORFs are analyzed by PCR amplification of the cloned inserts using the same forward 
and reverse primers, specific for each C. albicans sequence, that were used in the 
30 original PCR amplification cloning reactions. Successful amplification verifies the 
integration of the C. albicans sequences in the expression vector (Current Protocols in 
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Molecular Biology, John Wiley and Sons, Inc., F. Ausubel et ah, eds., 1994). 

Isolation and Preparation of Nucleic Acids From Transformants 

Individual clones of recombinant pET-28b vectors carrying properly cloned C. 
5 albicans ORFs are picked and incubated in 5 mis of LB broth plus 25 microgram/ml 
kanamycin sulfate overnight. The following day plasmid DNA is isolated and 
purified using the Qiagen plasmid purification protocol (Qiagen Inc., Chatsworth, CA, 
USA). 

10 Expression Of Recombinant C. albicans Sequences In E. coli 

The pET vector can be propagated in any E. coli K-12 strain e.g. HMS174, 
HB101, JM109, DH5, etc. for the purpose of cloning or plasmid preparation. Hosts 
for expression include E. coli strains containing a chromosomal copy of the gene for 
T7 RNA polymerase. These hosts are lysogens of bacteriophage DE3, a lambda 

1 5 derivative that carries the lad gene, the lacUVS promoter and the gene for T7 RNA 
polymerase. T7 RNA polymerase is induced by addition of isopropyl-B-D- 
thiogalactoside (IPTG), and the T7 RNA polymerase transcribes any target plasmid, 
such as pET-28b, carrying its gene of interest. Strains used include: BL21(DE3) 
(Studier, F.W., Rosenberg, A.H., Dunn, J.J, and Dubendorff, J.W. (1990) Meth. 

20 Enzymol. 185,60-89). 

To express recombinant C. albicans sequences, 50 nanograms of plasmid 
DNA isolated as described above is used to transform competent BL21(DE3) bacteria 
as described above (provided by Novagen as part of the pET expression system kit). 
The lacZ gene (beta-galactosidase) is expressed in the pET-System as described for 

25 the C. albicans recombinant constructions. Transformed cells are cultured in SOC 
medium for 1 hour, and the culture is then plated on LB plates containing 25 
micrograms/ml kanamycin sulfate. The following day, ftingal colonies are pooled and 
grown in LB medium containing kanamycin sulfate (25 micrograms/ml) to an optical 
density at 600 nM of 0.5 to 1.0 O.D. units, at which point, 1 millimolar IPTG was 

30 added to the culture for 3 hours to induce gene expression of the C. albicans 
recombinant DNA constructions . 
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After induction of gene expression with IPTG, bacteria are pelleted by 
centrifugation in a Sorvall RC-3B centrifuge at 3500 x g for 15 minutes at 4°C. 
Pellets are resuspended in 50 milliliters of cold 10 mM Tris-HCl, pH 8.0, 0.1 M NaCl 
and 0.1 mM EDTA (STE buffer). Cells are then centrifliged at 2000 x g for 20 min at 
5 4°C. Wet pellets are weighed and frozen at -80°C until ready for protein purification. 

A variety of methodologies known in the art can be utilized to purify the 
isolated proteins. (Current Protocols in Protein Science, John Wiley and Sons, Inc., J. 
E. Coligan et al., eds., 1995). For example, the frozen cells are thawed, resupended in 
buffer and ruptured by several passages through a small volume microfluidizer 
10 (Model M-l 10S, Microfluidics International Corporation, Newton, MA). The 

resultant homogenate is centrifliged to yield a clear supernatant (crude extract) and 
following filtration the crude extract is fractionated over columns. Fractions are 
monitored by absorbance at OD280 nm. and peak fractions may analyzed by SDS- 
PAGE. 

15 The concentrations of purified protein preparations are quantified 

spectrophotometrically using absorbance coefficients calculated from amino acid 
content (Perkins, S.J. 1986 Eur. J. Biochem. 157, 169-180). Protein concentrations 
are also measured by the method of Bradford, M.M. (1976) Anal. Biochem. 72, 248- 
254, and Lowry, O.H., Rosebrough, N., Fair, A.L. & Randall, R.J. (1951) J. Biol. 

20 Chem. 193, pages 265-275, using bovine serum albumin as a standard. 

SDS-polyacrylamide gels of various concentrations are purchased from 
BioRad (Hercules, CA, USA), and stained with Coomassie blue. Molecular weight 
markers may include rabbit skeletal muscle myosin (200 kDa), E. coli P-galactosidase 
(116 kDa), rabbit muscle phosphorylase B (97.4 kDa), bovine serum albumin (66.2 

25 kDa), ovalbumin (45 kDa), bovine carbonic anhydrase (31 kDa), soybean trypsin 

inhibitor (21.5 kDa), egg white lysozyme (14.4 kDa) and bovine aprotinin (6.5 kDa). 



EQUIVALENTS 

Those skilled in the art will recognize, or be able to ascertain using no more 
30 than routine experimentation, many equivalents to the specific embodiments and 

methods described herein. The specific embodiments described herein are offered by 
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way of example only, and the invention is to limited only by the terms of the 
appended claims, along with the full scope of equivalents to which such claims 
entitled. 
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hypothetical 5 1 .6 kd protein in 
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hsp26-secl8 intergenic region. 


hypothetical 5 1 .6 kd protein in 
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hypothetical 62.6 kd protein in 
cdsl-rp!2 intergenic region. 


hypothetical 44.2 kd protein in 
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hypothetical 19.9 kd protein in fur4« 
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hypothetical 34.5 kd protein in 
ste50-his4 intergenic region. 
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bud3 intergenic region. 
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mrps9-yswl intergenic region. 
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hypothetical 29.0 kd protein in 
pwp2-sup61 intergenic region. 
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hypothetical 47.2 kd protein in 
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notl/cdc39-hmr intergenic region. 


hypothetical 20.7 kd protein in 
kin82 5'region. 


hypothetical 21.7 kd protein in tupl- 
abpl intergenic region. 


hypothetical 48.5 kd protein in ersl- 
1 srb8 intergenic region. 


hypothetical 13.6 kd protein in cpr4- 
so!2 intergenic region. 
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hypothetical 25.0 kd protein in 
wbpl-mnnl intergenic region 
precursor. 


hypothetical 27.7 kd protein in isclO 
3'region. 


hypothetical 27.7 kd protein in isclO 
3'region. 


hypothetical 27.7 kd protein in isclO 
3'region. 


hypothetical 34.8 kd protein in 
rad24-bmhl intergenic region. 


hypothetical 34.8 kd protein in 
rad24-bmh 1 intergenic region. 


hypothetical 1 1 1 .4 kd protein ! 
c26fl.08c in chromosome i. 


hypothetical 57.2 kd protein 
cl 2b 10. 16c in chromosome i. 


hypothetical 57.2 kd protein 
c 12b 10. 16c in chromosome i. 


hypothetical 16.9 kd protein 
c 12b 1 0.1 5c in chromosome i. 


hypothetical 37.0 kd protein in 
rpl41a-inhl intergenic region. 


hypothetical protein c22eI2.0l in 
chromosome i (fragment). 


hypothetical 52.2 kd protein in ada2 
3'region. 


hypothetical 18.9 kd protein in slu7 
3Vegion, 


hypothetical 33.2 kd protein in sssl- 
slu7 intergenic region. 


hypothetical 23.1 kd protein in 
pdc2-afrl intergenic region. 
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hypothetical 1 15.9 kd protein in 
'pern 1-rpl 15b intergenic region. 


hypothetical 1 15.9 kd protein in 
pem 1-rpl 15b intergenic region. 


hypothetical 1 15.9 kd protein in 
pem 1-rpl 15b intergenic region. | 


hypothetical 50.8 kd protein in 
pau2-glyl intergenic region. 


hypothetical 18.5 kd protein in glyl- 
gdal intergenic region. 


hypothetical 18.5 kd protein in glyl- 
gdal intergenic region. 


hypothetical 106.1 kd protein in 
glyl-gdal intergenic region. 


hypothetical 35.6 kd protein in 
mcm3-vma3 intergenic region. 


hypothetical 78.3 kd protein in rip 1 - 
ura3 intergenic region. 


hypothetical 78.3 kd protein in rip 1 - 
ura3 intergenic region. 


hypothetical 57.4 kd protein in 
mms21-ubc8 intergenic region. 


hypothetical 61.3 kd protein in 
mms21-ubc8 intergenic region. 


hypothetical 64.0 kd protein in 
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hypothetical 72.5 kd protein in 
gcn4-wbpl intergenic region. 


hypothetical 14.3 kd protein in 
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hypothetical 25.0 kd protein in 
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precursor. 
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hypothetical 14.4 kd protein in rnrl- 
ald3 intergenic region. 


hypothetical 56.6 kd protein in 
gep2-icll intergenic region. 


hypothetical 56.5 kd protein in cajl- 
hom3 intergenic region. 


hypothetical 74.0 kd protein in cajl- 
hom3 intergenic region. 


hypothetical 74.0 kd protein in cajl- 
hom3 intergenic region. 


hypothetical 100.3 kd protein in 
mei4-cajl intergenic region. 


hypothetical 17.1 kd protein in 
sahl-mei4 intergenic region. 


yemanuclein-alpha. 


yemanuclein-alpha. 


hypothetical 18.3 kd protein in 
ga!83-ypt8 intergenic region. 


hypothetical 22.4 kd protein in 
ga!83-ypt8 intergenic region. 


hypothetical 53.9 kd protein in afg3- 
seb2 intergenic region. 


hypothetical 53.9 kd protein in afg3- 
seb2 intergenic region. 


hypothetical 25.6 kd protein in ntf2- 
srpl intergenic region. 


hypothetical 26.9 kd protein in 
mnnl-pmi40 intergenic region. 


hypothetical 26.8 kd protein in hxt8 
5'region. 


hypothetical 20.7 kd protein in hxt8- 
canl intergenic region. 
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hypothetical 64.8 kd protein in gdil- 
cox!5 intergenic region. 


hypothetical 26.2 kd protein in gdi 1 - 
cox!5 intergenic region. 


hypothetical 16.6 kd protein in gdi 1 - 
cox 15 intergenic region. 


hypothetical 20.4 kd protein in glc7- 
gdil intergenic region. 


hypothetical 195.4 kd protein in 
rps26b-glc7 intergenic region. 


hypothetical 23.5 kd protein in rsp5- 
pakl intergenic region. 


hypothetical 23.5 kd protein in rsp5- 
pakl intergenic region. 


hypothetical 40.8 kd protein in rsp5- 
pakl intergenic region. 


hypothetical 29.7 kd protein in rsp5- 
pakl intergenic region. 


hypothetical 29.7 kd protein in rsp5- 
pakl intergenic region. 


hypothetical 81.5 kd protein in ussl- 
bebl intergenic region. 


hypothetical 164.4 kd protein in 
met6-pup3 intergenic region. 


hypothetical 33.9 kd protein in 
rps24ea-ilvl intergenic region. 


hypothetical 62.3 kd protein in 
rps24ea-ilvl intergenic region. 


hypothetical 72.4 kd protein in 
rps24ea-ilvl intergenic region. 


hypothetical 72.4 kd protein in 
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hypothetical 79.5 kd protein in 
rps24ea-ilvl intergenic region. 
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hypothetical 18.1 kd protein in 
snp2-mdjl intergenic region. 


hypothetical 78.8 kd protein in 
hsp!2-hxtl0 intergenic region. 


hypothetical 207.6 kd protein in 
smcl-sec4 intergenic region. 


hypothetical 207.6 kd protein in 
smcl-sec4 intergenic region. 


hypothetical 207.6 kd protein in 
smcl-sec4 intergenic region. 


hypothetical 207.6 kd protein in 
smcl-sec4 intergenic region. 


hypothetical 28.8 kd protein in 
smcl-sec4 intergenic region. 


hypothetical 95.4 kd protein in sec4- 
msh4 intergenic region. 


hypothetical 92.5 kd protein in 
bem2-spt2 intergenic region. 


hypothetical 38.2 kd protein in 
bem2-spt2 intergenic region. 


hypothetical 49.5 kd protein in 
ubp3-pet!22 intergenic region. 


hypothetical 49.5 kd protein in 
ubp3-pet!22 intergenic region. 


hypothetical 45.7 kd protein in 
ubp5-sptl5 intergenic region. 


hypothetical 45.7 kd protein in 
ubp5-spt!5 intergenic region. 


hypothetical 45.7 kd protein in 
ubp5-spt!5 intergenic region. 


hypothetical 45.7 kd protein in 
ubp5-sptI5 intergenic region. 


hypothetical 47.4 kd protein in 
magl-ubp5 intergenic region. 
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hypothetical 55.1 kd protein in fabl- 
pes4 intergenic region. 


hypothetical 55.1 kd protein in fabl- 
pes4 intergenic region. 


hypothetical 137.7 kd protein in 
ugsl-fabl intergenic region. 


hypothetical 137,7 kd protein in 
ugsl-fabl intergenic region. 


hypothetical 18.2 kd protein in 
nic96-mprl intergenic region. 


hypothetical 23.6 kd protein in 
degl-nic96 intergenic region. 


hypothetical 23.6 kd protein in 
degl-nic96 intergenic region. 


hypothetical 25.2 kd protein in thi5 
5'region and in rpd3 5'region. | 


hypothetical 82.2 kd protein in 
emp47-sec53 intergenic region. 


hypothetical 82.2 kd protein in 
emp47-sec53 intergenic region. 


hypothetical 24.0 kd protein in 
emp47-sec53 intergenic region. 


hypothetical 33.5 kd protein in 
sec53-actl intergenic region. 


hypothetical 30. 1 kd protein in 
rpo41-hacl intergenic region. 


hypothetical 1 19.5 kd protein in 
rpo4l-hacl intergenic region. 


hypothetical 57.6 kd protein in 
cakl-ste2 intergenic region. 


hypothetical 57.6 kd protein in 
cakl-ste2 intergenic region. 


hypothetical 96.7 kd protein in ste2- 
frs2 intergenic region. 
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hypothetical 38.5 kd protein in ervl- 
gls2 intergenic region. 


hypothetical 33.3 kd protein in 
vma7-rps31a intergenic region. 


hypothetical 27.8 kd protein in 
vma7-rps31a intergenic region. 


hypothetical 57.5 kd protein in 
vma7-rps31a intergenic region. 


hypothetical 34,7 kd protein in 
msb2-ugal intergenic region. 


hypothetical 71.4 kd protein in sec9- 
msb2 intergenic region. 


hypothetical 52.9 kd protein in 
pmcl-tfg2 intergenic region. 


hypothetical 55.2 kd protein in 
pmcl-tfg2 intergenic region. 


hypothetical 22.2 kd protein in 
pmcl-tfg2 intergenic region. 


hypothetical 75.9 kd protein in 
sap 1 55-ymr3 1 intergenic region. 


hypothetical 75.9 kd protein in 
sap 1 55-ymr3 1 intergenic region. 


hypothetical 52.9 kd protein in 
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hypothetical 31.9 kd protein in 
rp!5b-qcr6 intergenic region. 


hypothetical 76.3 kd protein in 
cdc!4-metl0 intergenic region. 


hypothetical 3 1.8 kd protein in his2- 
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hypothetical 31.8 kd protein in his2- 
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hypothetical 55.1 kd protein in fabl- 
pes4 intergenic region. 
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hypothetical 20,8 kd protein in 
micl-srb5 intergenic region. 


hypothetical 38.8 kd protein in 
micl-srb5 intergenic region. 


hypothetical 28.3 kd protein in 
vasl-asklO intergenic region. 


hypothetical 58.2 kd protein in 
dbf2-vasl intergenic region. 


hypothetical 140.5 kd protein in 
cttl-prp31 intergenic region. 


hypothetical 140.5 kd protein in 
cttl-prp31 intergenic region. 


hypothetical 140.5 kd protein in 
cttl -prp3 1 intergenic region. 


hypothetical 38.3 kd protein in i 
rpl 16b-pdc6 intergenic region. 


hypothetical 38.3 kd protein in 
rp!16b-pdc6 intergenic region. 


hypothetical 28.6 kd protein in 
mupl-spr3 intergenic region. 


hypothetical 106.7 kd protein in 
mupl-spr3 intergenic region. 


hypothetical 106.7 kd protein in 
mupl-spr3 intergenic region. 


hypothetical 71.3 kd protein in 
scm4-mupl intergenic region. 


hypothetical 44.2 kd protein in 
rmel-tfc4 intergenic region. 


hypothetical 25.2 kd protein in 
acbl-kssl intergenic region. 


hypothetical 27.6 kd protein in 
rp!26b-acbl intergenic region. 


hypothetical 27.2 kd protein in gls2- 
rp!26b intergenic region. 
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hypothetical 45.2 kd gtp-binding 
protein intrxl-rtal intergenic 
region. 


hypothetical 33.3 kd protein in 
ade3-ser2 intergenic region. 


hypothetical trp-asp repeats 
containing protein in pmt6-pctl 
intergenic region. 


hypothetical trp-asp repeats 
containing protein in pmt6-pctl 
intergenic region. 


hypothetical 95.4 kd protein in 
sngl-pmt6 intergenic region. 


hypothetical 68.3 kd protein in j 
pdxl-sngl intergenic region. 


hypothetical 68.3 kd protein in 
pdxl-sngl intergenic region. 


hypothetical 52.8 kd protein in 
bub I -hip I intergenic region . 


hypothetical 52.8 kd protein in 
bubl-hipl intergenic region. 


hypothetical 78.8 kd protein in ergl- 
rnr4 intergenic region. 


1 hypothetical 39.6 kd protein in 
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hypothetical 38.6 kd protein in 
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hypothetical 48.5 kd protein in apl6- 
mesl intergenic region. 
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mesl intergenic region. 


putative mitochondrial carrier 
ygr257c. 


hypothetical 22.3 kd protein in 
mgal-gcn4 intergenic region. 


hypothetical 26.7 kd protein in tds4- 
mgal intergenic region. 


hypothetical 86.6 kd protein in 
pfkl-tds4 intergenic region. 


hypothetical 86.6 kd protein in 
pfk 1 -tds4 intergenic region. 
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hypothetical 31.3 kd protein in 
taf!45-yorl intergenic region. 


hypothetical 40.2 kd protein in 
tafl45-yorl intergenic region. 


hypothetical 34.3 kd protein in 
taf!45-yorl intergenic region. 


hypothetical 62.8 kd protein in 
taf!45-yorl intergenic region. 


hypothetical 17.9 kd protein in yta7- 
taf 1 45 intergenic region. 


hypothetical 17.9 kd protein in yta7- 
taf!45 intergenic region. 


hypothetical 224.8 kd protein in 
yta7-taf!45 intergenic region. 
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yta7-tafI45 intergenic region. 
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mesl-fo!2 intergenic region. 
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hypothetical 32. 1 kd protein in 
madl-scyl intergenic region. 


hypothetical 65.3 kd protein in 
madl-scyl intergenic region. 


hypothetical 65.3 kd protein in 
madl-scyl intergenic region. 


hypothetical 15.0 kd protein in 
scyl-hnml intergenic region. 


hypothetical 43.5 kd protein in 
rpb9-alg2 intergenic region. 


hypothetical 43.5 kd protein in 
rpb9-alg2 intergenic region. , 


hypothetical 72.9 kd protein in 
rpb9-alg2 intergenic region. 


hypothetical 73.1 kd protein in 
pycl-ubc2 intergenic region. 


hypothetical 15.9 kd protein in olel- 
tif4632 intergenic region. 


hypothetical 30.8 kd protein in olel- 
tif4632 intergenic region. 


hypothetical 29.4 kd protein in 
sugl-rnal5 intergenic region. 


hypothetical 14.4 kd protein in 
rp!32-cwh41 intergenic region. 


hypothetical 56.4 kd protein in 
rp!32-cwh41 intergenic region 
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hypothetical 27.1 kd protein in alkl- 
ckbl intergenic region. 


hypothetical 21.8 kd protein in 
ckbl-atel intergenic region. 


hypothetical 35.0 kd protein in bgI2- 
zuol intergenic region. 
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hypothetical 37.4 kd protein in 
sec27-ssmlb intergenic region. 


hypothetical 145.6 kd protein in 
ssmlb-cegl intergenic region. 


hypothetical 163.2 kd protein in 
ssmlb-cegl intergenic region. 


hypothetical 163.2 kd protein in 
ssmlb-cegl intergenic region. 


hypothetical 163.2 kd protein in 
ssmlb-cegl intergenic region. 


hypothetical 163.2 kd protein in 
ssmlb-cegl intergenic region. 


hypothetical 55.6 kd protein in 
cegl-sohl intergenic region. 


hypothetical 73.5 kd protein in scs3- 
sup44 intergenic region. 


hypothetical 80.0 kd protein in snf4- 
taf60 intergenic region. 


hypothetical 80.0 kd protein in snf4- 
taf60 intergenic region. 


hypothetical 77.3 kd protein in snf4- 
taf60 intergenic region. 


hypothetical 51.9 kd protein in 
taf60-g4pl intergenic region. 


hypothetical 72.0 kd protein in 
taf60-g4pl intergenic region. 


hypothetical 25.3 kd protein in \ 
cyh2-sehl intergenic region. 


hypothetical 17.8 kd protein in rpod 
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hypothetical 104.8 kd protein in 
j pan2-nup!45 intergenic region. 
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hypothetical 20. 1 kd protein in 
pdel-csel intergenic region. 


hypothetical 1 13.9 kd protein in 
pdel-csel intergenic region. 


hypothetical 1 13.9 kd protein in 
pdel-csel intergenic region. 


hypothetical 1 13.9 kd protein in [ 
pdel-csel intergenic region. 


hypothetical 75.4 kd protein in 
hap2-ade5,6 intergenic region. 


hypothetical 33.6 kd protein in 
sec!5-sap4 intergenic region. 


hypothetical 21.5 kd protein in 
sec!5-sap4 intergenic region. 


hypothetical 32.0 kd protein in 
gog5-c!gl intergenic region. 


hypothetical 21.9 kd protein in 
vam7-ypt32 intergenic region. 


hypothetical 167.1 kd protein in 
emp24-gcn 1 intergenic region. 


hypothetical 50.3 kd protein in 
acel-rad54 intergenic region. 


hypothetical 34.8 kd protein in sutl- 
rckl intergenic region. 


hypothetical 41.6 kd protein in sutl- 
rckl intergenic region. 


hypothetical 78.1 kd protein in 
tip20-mrfl intergenic region. 


hypothetical 72.6 kd protein in 
mrfl-sec27 intergenic region. 


hypothetical 72.6 kd protein in 
mrfl-sec27 intergenic region. 


hypothetical 90.8 kd protein in 
mrfl-sec27 intergenic region. 
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hypothetical 53.1 kd protein in 
spoll-opil intergenic region. 


hypothetical 36.1 kd protein in ylfl- 
prps4 intergenic region. 


hypothetical 67.5 kd protein in 
prps4-ste20 intergenic region. 


hypothetical 67.5 kd protein in 
prps4-ste20 intergenic region. 


hypothetical 38.0 kd protein in 
prps4-ste20 intergenic region. 


hypothetical 38.0 kd protein in 
prps4-ste20 intergenic region. 


hypothetical 51.2 kd protein in lagl- 
rpl 14b intergenic region. 


hypothetical 66.3 kd protein in hag2 
5Vegion. 


hypothetical 33.8 kd protein in twtl- 
phol2 intergenic region. 


hypothetical 33.8 kd protein in twtl- 
pho!2 intergenic region. 


hypothetical 60.5 kd protein in 
skn7-twtl intergenic region. 


hypothetical 69.0 kd protein in 
ppxl-rps7a intergenic region. 


hypothetical 69.0 kd protein in 
ppxl-rps7a intergenic region. 


hypothetical 22.8 kd protein in 
pdel-csel intergenic region. 


hypothetical 44.5 kd protein in 
pdel-csel intergenic region. 


hypothetical 44.5 kd protein in 
pdel-csel intergenic region. 


! hypothetical 45.9 kd protein in 
pdel-csel intergenic region. 
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hypothetical 62.7 kd protein in 
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cdc!2-orc6 intergenic region. 


hypothetical 68.3 kd protein in 
cdc!2-orc6 intergenic region. 


hypothetical 64.3 kd protein in 
cdc!2-orc6 intergenic region. 


hypothetical 64.3 kd protein in 
cdcl2-orc6 intergenic region. 


hypothetical 24.6 kd protein in 
nrkl-cdc!2 intergenic region. 


hypothetical 96.4 kd protein in 
nrkl-cdc!2 intergenic region. 
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nrkl intergenic region. 
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nrkl intergenic region. 
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hxtS-nrkl intergenic region. 


hypothetical 433.2 kd protein in 
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hypothetical 433.2 kd protein in 
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hypothetical 30.5 kd protein in 
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thsl intergenic region. 


hypothetical 1 02.4 kd protein in 
sen3-hopl intergenic region. 
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hopl-rps24eb intergenic region. 
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rnr3 intergenic region. 


hypothetical 17.7 kd protein in rnr3- 
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hypothetical 17.1 kd protein in rnr3- 
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hypothetical protein in ifml 
3 'region (fragment). 


hypothetical 71.4 kd protein in 
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hypothetical 30.3 kd protein in 
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36.7 kd protein in cbr5-not3 
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hypothetical zinc aminopeptidase 
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hypothetical 269.9 kd protein in 
fkhl-sthl intergenic region. 


hypothetical 269.9 kd protein in 
fkhl-sthl intergenic region. 


hypothetical 269.9 kd protein in 
fkhl-sthl intergenic region. 


hypothetical 123.6 kd protein in 
nup!59-cox5b intergenic region. 


hypothetical 42.5 kd protein in 
cox5b-pfk26 intergenic region. 


hypothetical 103.6 kd protein in 
cox5b-pfk26 intergenic region. 


hypothetical zinc metal loproteinase 
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hypothetical 27.4 kd protein in 
pfk26-sgal intergenic region. 


hypothetical 59.2 kd protein in 
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hypothetical 154.9 kd protein in 
cpr7-petl91 intergenic region. 
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cpr7-pet!91 intergenic region. 


hypothetical 27.4 kd protein in hyrl 
3'region. 


hypothetical 26.8 kd protein in hyrl 
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hypothetical 26.8 kd protein in hyrl 
3'region. 


hypothetical 26.8 kd protein in hyrl 
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containing protein in dbf8-met28 
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putative atp-dependent rna helicase 
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putative atp-dependent rna helicase 
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gefl-nup85 intergenic region. 
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hypothetical 49.0 kd protein in 
nspl-kar2 intergenic region. 


hypothetical 41.2 kd protein in 
pet!30-cct3 intergenic region. 


hypothetical 75.5 kd protein in cct3- 
cct8 intergenic region. 


hypothetical 18.6 kd protein in cct3- 
cct8 intergenic region. 


hypothetical 77.7 kd protein in cct3- 
cct8 intergenic region. 


hypothetical 77.7 kd protein in cct3- 
cct8 intergenic region. 


hypothetical 14.1 kd protein in cyrl- 
ostl intergenic region. 


hypothetical 182.0 kd protein in 
nmd5-hom6 intergenic region. 


hypothetical 182.0 kd protein in 
nmd5-hom6 intergenic region. 


hypothetical 182.0 kd protein in 
nmd5-hom6 intergenic region. 


hypothetical 39.0 kd protein in 
zmsl-mnsl intergenic region. 


hypothetical 45. 1 kd protein in rps5- 
zmsl intergenic region. 


hypothetical 23.6 kd protein in 
cpa2-atp2 intergenic region. 


hypothetical 32.0 kd protein in 
cpa2-atp2 intergenic region. 


hypothetical 32,2 kd protein in 
cpa2-atp2 intergenic region. 


hypothetical 32.2 kd protein in 
cpa2-atp2 intergenic region. 


hypothetical 80.2 kd protein in 
cpa2-atp2 intergenic region. 
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hypothetical 30.6 kd protein in 
scpl60-mrpl8 intergenic region 
precursor. 


hypothetical 30.6 kd protein in 
scpl60-mrpl8 intergenic region 
precursor. 


hypothetical 30.6 kd protein in 
scpl60-mrpl8 intergenic region 
precursor. 


hypothetical 89.2 kd protein in 
scp!60-mrpl8 intergenic region. 


hypothetical 25.1 kd protein in 
scpl60-mrpl8 intergenic region. 


hypothetical 28.5 kd protein in 
scp!60-mrpl8 intergenic region. 


hypothetical 94.9 kd protein in 
mrp!8-nup82 intergenic region. 


hypothetical 94.9 kd protein in 
mrp!8-nup82 intergenic region. 


hypothetical 46.4 kd protein in 
nup82-pep8 intergenic region. 


hypothetical 26.9 kd protein in 
nup82-pep8 intergenic region. 


hypothetical 54.2 kd protein in 
nup82-pep8 intergenic region. 


hypothetical protein in dfrl 3'region 
(fragment). 


hypothetical 191.5 kd protein in 
nspl-kar2 intergenic region. 


hypothetical 191.5 kd protein in 
nspl-kar2 intergenic region. 


hypothetical 191.5 kd protein in 
nspl-kar2 intergenic region. 
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hypothetical 53.5 kd protein in 
gcd 1 4-pos 18 intergenic region. 


hypothetical 53.5 kd protein in 
gcd 1 4-pos 18 intergenic region. 


hypothetical 19.3 kd protein in 
gcd 1 4-pos 18 intergenic region. 


hypothetical 200.0 kd protein in 
gzf3-smel intergenic region. 


hypothetical 200.0 kd protein in 
gzf3-smel intergenic region. 


hypothetical 200.0 kd protein in 
gzf3-smel intergenic region. 


hypothetical 16.2 kd protein in 
smel-mef2 intergenic region. 


hypothetical 70.2 kd protein in 
gshl-chs6 intergenic region. 


hypothetical 70.2 kd protein in 
gshl-chs6 intergenic region. 


hypothetical 24.5 kd protein in 
sap!85-bckl intergenic region. 


hypothetical 24.5 kd protein in 
sap!85-bckl intergenic region. 


hypothetical 56.4 kd protein in srs2- 
sip4 intergenic region. 


hypothetical 1 17.2 kd protein in 
trll-act3 intergenic region. 


hypothetical 82.5 kd protein in trll- 
act3 intergenic region. 


hypothetical 82.5 kd protein in trll- 
act3 intergenic region. 


hypothetical 30.6 kd protein in 
scpl60-mrp!8 intergenic region 
precursor. 
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hypothetical 61.5 kd protein in tpkl- 
farl intergenic region. 


hypothetical 23.2 kd protein in tpkl- 
farl intergenic region precursor. 


hypothetical 23.2 kd protein in tpkl- 
farl intergenic region precursor. 


hypothetical 23.2 kd protein in tpkl- 
farl intergenic region precursor. 


hypothetical 23.2 kd protein in tpkl- 
farl intergenic region precursor. 


hypothetical 23.2 kd protein in tpkl- 
farl intergenic region precursor. 


hypothetical 23.2 kd protein in tpkl- 
farl intergenic region precursor. 


hypothetical 76.2 kd protein in farl- 
fbp26 intergenic region. 


hypothetical 76.2 kd protein in farl- 
fbp26 intergenic region. 


hypothetical 77.4 kd protein in inol- 
ids2 intergenic region. 


hypothetical 26.9 kd protein in inol- 
ids2 intergenic region. 


hypothetical 34.4 kd protein in ids2- 
mpi2 intergenic region. 


hypothetical 47.4 kd protein in 
rps25b-mrs3 intergenic region. 


hypothetical 41.5 kd protein in 
mrs3-ura2 intergenic region. 
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hypothetical 27.4 kd protein in 
mer2-cpr7 intergenic region. 


hypothetical 22.5 kd protein in 
spcl-ilv3 intergenic region. 


hypothetical 35.6 kd protein in 
spcl-ilv3 intergenic region. 


hypothetical 38.5 kd protein in sui2- 
tdh2 intergenic region. 


hypothetical 62.2 kd protein in pre3- 
sagl intergenic region. 


hypothetical 67.0 kd protein in pre3- 
sagl intergenic region. 
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nucl-prp2I intergenic region. 
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nucl-prp21 intergenic region. 
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hypothetical 35.8 kd protein in 
prp!6-srp40 intergenic region. 


hypothetical 1 5. 1 kd protein in 
nup!33-hbsl intergenic region. 


hypothetical 39.6 kd protein in 
mtdl-nupl33 intergenic region. 


hypothetical 39.6 kd protein in 
mtdl-nup!33 intergenic region. 


hypothetical 96.8 kd protein in sis2- 
mtdl intergenic region. 


hypothetical 18.4 kd protein in sis2- 
mtdl intergenic region. 


hypothetical 38.5 kd protein in 
ccpl-sis2 intergenic region. 


hypothetical 39.4 kd protein in 
ccpl-sis2 intergenic region. 


hypothetical 39.4 kd protein in 
ccpl-sis2 intergenic region. 


hypothetical 22.0 kd protein in las 1 - 
ccpl intergenic region. 


hypothetical 22.0 kd protein in las 1 - 
ccpl intergenic region. 


hypothetical 31.6 kd protein in tifl- 
ktr2 intergenic region. 


hypothetical 48.8 kd protein in trk2- 
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hypothetical 46.6 kd protein in 
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sap!90-spo!4 intergenic region. 


hypothetical 32. 1 kd protein in 
ypt52-gcn3 intergenic region. 
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ypt52-gcn3 intergenic region. 



-149- 



o 

i 

H 

5 

4a. 
U» 


CONTIG3440 


cr 

so 

X 

UJ 
3~ 

to 
to 


o 

H 

5 

UJ 

to 

Ui 


CONTIG3125 


s 

X 

to 

3" 
4> 
UJ 


CONTIG5762 


CONTIG4548 


o- 
to 

X 
Os 

-J 

4a. 
X 


CONTIG2893 


CONTIG1834 


b2x 18881.x 


CONTIG5763 


CONTIG5720 


CONTIG4779 


CONTIG2650 


4a. 
00 

oc 

UJ 
4* 
Ui 

I 10 
O 

C 


to 

4* 

4a. 

£ 

o 

00 
UJ 


UJ 
4^ 
4* 

4* 
© 

SO 

r 


Ul 

to 
o 

00 

to 
OS 

o 

UJ 

1 

00 


4*. 
-J 

to 

OO 

to 

Ui 

1 

Ul 


UJ 
Ul 

J> 

UJ 

so 
Ul 

to 

£ 


4a. 
SO 
Ui 
Os 
Ul 

o 

1° 
u5 

4^ 


to 

Os 
© 
Ul 


to 

4a. 
00 
© 
Os 
4a. 
OS 

Ul 

1 

C3 
! to 


4a. 
so 
Os 
4a. 
© 

53 
'ro 


32476432_c3_3 


UJ 
Ul 
00 
© 
Ul 
4a. 

'ro 


oo 

Ul 

so 
OS 
to 

<: 

UJ 

'ro 
to 


UJ 
4a. 
J> 

© 
SO 

Ul 
Ul 

c 


UJ 
Os 

4a* 
© 
Os 

-o 

Ul 

p 

to 


to 

00 

so 

-J 

"O 
Ui 

UJ 


o 
-o 

UJ 


o 
-J 
to 


© 


o 
-o 
o 


o 

OS 

SO 


o 

Os 
00 


O 

Os 

-o 


o 

Os 
Os 


© 
Os 

Ul 


© 

Os 
4a. 


© 

OS 

UJ 


© 

OS 
to 


© 

OS 


© 

OS 

© 


© 

Ul 
SO 


© 

Ul 

oo 


15176 


15175 


15174 


15173 


15172 


15171 


15170 


15169 


15168 


15167 


15166 


15165 


15164 


15163 


15162 


15161 


0\ 

to 

4a. 


1428 


4a. 
O 
OO 


-o 

4*. 


1206 


Os 

-o 

Ul 


1281 


1353 


00 

UJ 


1191 


1116 


to 

UJ 
^4 


3180 


oo 

so 


1803 


1092 


tO 
O 
00 


4* 
-J 
Os 


UJ 
OS 


to 

UJ 
00 


4a. 
O 
to 


to 
to 

Ul 


4a. 

to 


4a. 
Ui 


to 
-o 
<J 


UJ 

so 
-4 


UJ 

-o 

N> 


~o 
so 


1060 


to 

SO 
UJ 


OS 
© 


UJ 

Os 
4a. 


P36095 


P36096 


P36097 


P36097 


P36097 


P36104 


P34243 


P34243 


P34241 


P34241 


P34241 


P36108 


P36166 


P36165 


P36165 


P36164 


to 
Ui 


UJ 
so 
Ui 


© 


© 
to 


SO 
sO 


to 


oo 


SO 


to 


Ui 
4a. 


to 

4a. 


© 


UJ 
Ul 
OS 


Ul 
UJ 


Ui 

© 


UJ 
UJ 

© 


9.8(10)-18 


5.5(10)-36 


UJ 

© 


0.02 


0.11 


l.l(10)-23 ! 


Os 
4a. 

/ V 

© 
i 

oo 

Ul 


0.42999 


4.7(1 0)-22 


1.3(10)-7 


7.4(1 OH 9 


8.6(I0)-10 


UJ 
Ul 

© 

UJ 

to 


bo 
© 

Ui 

© 


UJ 
Ul 

© 

> — ' 
4a. 
OO 


6.4(1 0)-30 


Saccharomyces | 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


hypothetical 26.2 kd protein in 
phdl-ptml intergenic region. 


hypothetical 87.9 kd protein 
precursor in ptm-irxl intergenic 
region. 


hypothetical \ 18.9 kd protein in 
ptml-ixrl intergenic region. 


hypothetical 1 1 8.9 kd protein in 
ptml-ixrl intergenic region. 


hypothetical 1 18.9 kd protein in 
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hypothetical 37.1 kd protein in 
ram2-atp7 intergenic region. 


hypothetical 78.3 kd protein in 
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ram2-atp7 intergenic region. 
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put3-ccel intergenic region. 
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hypothetical 203.3 kd protein in 
put3-ccel intergenic region. 


hypothetical 16.7 kd protein mrpl7- 
metI4 intergenic region. 
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prpl6-srp40 intergenic region. 
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prp!6-srp40 intergenic region. 
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hypothetical 37.4 kd protein in 
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region. 
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hypothetical 121.1 kd protein in 
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hypothetical 121.1 kd protein in 
bio3-hxtl7 intergenic region 
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hypothetical 121.1 kd protein in 
bio3-hxtl7 intergenic region 
precursor. 


hypothetical 36.4 kd protein in 
pop2-hol 1 intergenic region. 


hypothetical 36.4 kd protein in 
pop2-ho!l intergenic region. 


hypothetical gtp-binding protein in 
pop2-holl intergenic region. 


hypothetical gtp-binding protein in 
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hypothetical 57.7 kd protein in lys9- 
pop2 intergenic region. 


hypothetical 15.1 kd protein in 
pet494-msol intergenic region. 
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coq2 intergenic region. 


putative atp-dependent rna helicase 
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hypothetical 39.6 kd protein in sol 1 - 
coq2 intergenic region. 
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hypothetical 43.7 kd protein in yip3- 
tfc5 intergenic region. 


hypothetical 97.0 kd protein in yip3- 
tfc5 intergenic region. 


hypothetical 51.0 kd protein in yip3- 
tfc5 intergenic region. 


hypothetical 43.8 kd protein in 
nce3-hht2 intergenic region. 


hypothetical 32.8 kd protein in 
nce3-hht2 intergenic region. 


hypothetical 54.4 kd protein in 
hhf2-ume3 intergenic region. 


hypothetical 108.5 kd protein in 
ume3-publ intergenic region. 


hypothetical 108.5 kd protein in 
ume3-publ intergenic region. 


hypothetical 56.2 kd protein in 
ume3-publ intergenic region. 


hypothetical 56.2 kd protein in 
ume3-publ intergenic region. 


hypothetical 80. 1 kd protein in 
ume3-publ intergenic region. 


hypothetical 80. 1 kd protein in 
ume3-publ intergenic region. 


hypothetical 49.9 kd protein in 
spol-sisl intergenic region. 


hypothetical 27.5 kd protein in 
spol-sisl intergenic region. 


hypothetical trp-asp repeats 
containing protein in sisl-mrpl2 
intergenic region. 


hypothetical trp-asp repeats 
containing protein in sis!-mrpl2 
intergenic region. 



-157- 



CONTIG377 


CONTIG2241 


CONTIG4999 


CONTIG5614 


blx!6339.x 


CONTIG3292 


CONTIG3292 


CONTIG4734 


CONT1G4208 


CONTIG3682 


CONTIG5770 


CONT1G4061 


CONTIG4265 


CONTIG22 


CONT1G5386 


CONTIG4265 


CONTIG914 


00 

o 
u> 

Ul 

1° 


4^ 
00 

o 
to 
o 


13907792_cl_7 


4a. 

ON 
Ul 

to 

ON 


4a. 
ON 
Ul 
00 
4*. 
UJ 

'o 

UJ 

'uj 


to 
to 

ON 

4^ 

( to 
'uj 


4a. 
UJ 
O 
UJ 
UJ 


20391382_fl_l 


to 

4a. 
4a. 
UJ 

ON 
00 

to 

'OS 


23847625_f2_3 


to 
to 
-J 

ON 

-o 

OO 

C 

to 
o 


to 

UJ 
ON 

to 

VO 

ON 
ON 

to 

1 

o 

c 


UJ 
4a. 
UJ 
-O 
Ui 
UJ 
UJ 

'« 


22147827_cl_2 


33240886_fl_2 


to 

Ul 
Ul 

o 
-o 

ON 
00 

*o 

1 

f3 


to 
to 

UJ 

o 

4*. 
Ui 
UJ 
ON 

1 

S3 
'io 


tO 
O 
UJ 


to 
o 
to 


to 
o 


to 
o 
o 


VO 
NO 


VO 
OO 


NO 
-J 


NO 
ON 


VO 
Ul 


NO 

4a. 


NO 
UJ 


NO 

to 


NO 


VO 
O 


OO 
NO 


00 
OO 


OO 
— 1 


15306 


15305 


15304 


15303 


15302 


15301 


15300 


15299 


15298 


15297 


15296 


15295 


15294 


15293 


15292 


15291 


15290 


ON 
Ui 

-J 


-o 

NO 

to 


Ul 

*-J 
UJ 


1185 


Ui 
ON 
^1 


NO 
OO 
-J 


Ul 
UJ 
4a. 


1428 


4a. 
4a. 


UJ 
4a. 
Ul 


-o 

00 
UJ 


2094 


1137 


4a. 
-O 
4a. 


UJ 
4a. 
OO 


to 

00 

to 


Ul 

NO 
-O 


to 

NO 


to 
On 
4* 


NO 


UJ 

VO 
Ui 


OO 
NO 


UJ 

to 

NO 


-J 
OO 


4a. 
-O 
ON 


to 

4a. 
OO 


Ui 


to 

ON 


ON 

VO 
OO 


UJ 
NO 


Ui 

OO 


On 


vO 
4a. 


NO 
NO 


P50947 


P50947 


P53932 


P53934 


P48231 


P48233 


P48233 


P53938 


P53941 


P53944 


P53949 


P53950 


P53951 


P53951 


P53952 


P53953 


P53958 


4*. 


to 

ON 

to 


UJ 
ON 
NO 


4* 
OO 
4*. 


to 

OO 
ON 


UJ 
OO 
Ul 


Ul 


to 

Ui 


00 

to 

o 


to 

Ui 

NO 


to 
© 


-J 

-o 


to 

OO 
00 


to 

4a. 
Ul 


4a. 


NO 
OO 


© 

UJ 


CO 

to 
o 

to 
to 


o 
o 

1 

to 
to 


to 
o 

o 

1 

UJ 
UJ 


3.1(10)-46 


o 
o 

tb 
4a. 


-J 
© 

o 

On 

UJ 


Ui 

to 
o 

1 

© 


NO 
UJ 

. O 

UJ 
NO 


-J 

Ul 

o 

1 

OO 

to 


Z3-(0l)tT 


6.2(10)-31 


6.4(10)- 10 


OO 

o 

1 

to 

Ul 


UJ 

o 

1 

to 
o 


O 
i 

On 


o 

1 


0.0033 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


Saccharomyces 
cerevisiae 


hypothetical 37.0 kd protein in ras2- 
ypt53 intergenic region. 


hypothetical 37.0 kd protein in ras2- 
ypt53 intergenic region. 


hypothetical 71.2 kd protein in ras2- 
ypt53 intergenic region. 


hypothetical 45.5 kd protein in 
ypt53-rho2 intergenic region. 


hypothetical 132.5 kd protein in 
top2-mktl intergenic region. 


putative mitochondrial carrier 
ynl083w. 


putative mitochondrial carrier 
yn!083w. 


hypothetical 41.7 kd protein in 
pmsl-tpml intergenic region. 


hypothetical 33.5 kd protein in 
mksl-mskl intergenic region. 


hypothetical 35.9 kd protein in 
mas5-gcdl0 intergenic region. 


hypothetical 22.5 kd protein in 
nop2-omp2 intergenic region. 


hypothetical 128.1 kd protein in 
omp2-msg5 intergenic region. 


hypothetical 45.6 kd protein in 
cox5a-yip3 intergenic region. 


hypothetical 45,6 kd protein in 
cox5a-yip3 intergenic region. 


hypothetical 31.4 kd protein in 
cox5a-yip3 intergenic region. 


hypothetical 98.9 kd protein in 
cox5a-yip3 intergenic region. 


hypothetical 43.7 kd protein in yip3- 
tfc5 intergenic region. 
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hypothetical 1 19.3 kd protein in 
fprl-tom22 intergenic region. 


hypothetical 27.7 kd protein in cptl- 
spc98 intergenic region. 


hypothetical 109.8 kd protein in 
cptl-spc98 intergenic region. 


hypothetical 1 09.8 kd protein in 
cptl-spc98 intergenic region. 


hypothetical 54.9 kd protein in 
spc98-tom70 intergenic region. 


hypothetical 1 10.9 kd protein in 
spc98-tom70 intergenic region. 


hypothetical 1 10.9 kd protein in 
spc98-tom70 intergenic region. 


hypothetical 1 10.9 kd protein in 
spc98-tom70 intergenic region. 


hypothetical 13.2 kd protein in 
spc98-tom70 intergenic region. 


hypothetical 56.5 kd protein in 
tom70-psul intergenic region. 
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tom70-psul intergenic region. 


hypothetical 57.6 kd protein in 
mlsl-rpc!9 intergenic region. 


hypothetical 74.0 kd protein in 
mlsl-rpcl9 intergenic region. 


hypothetical 74.0 kd protein in 
mlsl-rpcl9 intergenic region. 


hypothetical 30.7 kd protein in 
cyb5-leu4 intergenic region. 


1 hypothetical 27.2 kd protein in poll- 
| ras2 intergenic region. 
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hypothetical 46.5 kd protein in 
nprl-rps3 intergenic region. 
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psdl intergenic region. 


hypothetical 49.7 kd protein in 
skol-rp!44a intergenic region. 


hypothetical 18.1 kd protein in 
ygpl-yck2 intergenic region. 


hypothetical 33.7 kd protein in 
ygpl-yck2 intergenic region. 


hypothetical 31.5 kd protein in 
ygpl-yck2 intergenic region. 


hypothetical 15.2 kd protein in 
yck2-rpc8 intergenic region. 


hypothetical 46.2 kd protein in 
yck2-rpc8 intergenic region. 


hypothetical 12.1 kd protein in rpc8- 
mfa2 intergenic region. 


hypothetical 84.2 kd protein in 
mfa2-mep2 intergenic region. 


hypothetical 84.2 kd protein in 
mfa2-mep2 intergenic region. 


hypothetical 41.2 kd protein in fprl- 
tom22 intergenic region. 


hypothetical 41.2 kd protein in fprl- 
tom22 intergenic region. 


hypothetical 41.2 kd protein in fprl- 
tom22 intergenic region. 


1 hypothetical 41.2 kd protein in fprl- 

tom22 intergenic region, 
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hypothetical 1 19.3 kd protein in 
fprl-tom22 intergenic region. 


hypothetical 1 19.3 kd protein in 
fprl-tom22 intergenic region. 
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hypothetical 3 1 .6 kd protein in sin4- 
ure2 intergenic region. 


hypothetical 86.9 kd protein in ure2- 
ssu72 intergenic region. 


hypothetical 56.6 kd protein in ure2- 
ssu72 intergenic region. 


hypothetical 56.6 kd protein in ure2- 
ssu72 intergenic region. 


hypothetical 66.5 kd protein in 
ade!2-rapl intergenic region. 


hypothetical 36.2 kd protein in rapl- 
merl intergenic region. 


hypothetical 36.2 kd protein in rap 1- 
merl intergenic region. 


hypothetical 25.3 kd protein in rapl- 
merl intergenic region. 


hypothetical 88.8 kd protein in rapl- 
merl intergenic region. 


hypothetical 88.8 kd protein in rapl- 
merl intergenic region. 


hypothetical 49. 1 kd protein in ssb2- 
spx 1 8 intergenic region. 


hypothetical 27.5 kd protein in 
spxl9-gcr2 intergenic region. 


hypothetical 63.9 kd protein in 
whi3-chsl intergenic region. 


hypothetical 63.9 kd protein in 
whi3-chsl intergenic region. 


hypothetical 40.2 kd protein in 
chsl-srpl intergenic region. 


hypothetical 22.0 kd protein in 
chsl-srpl intergenic region. 


hypothetical 61.8 kd protein in 
nprl-rps3 intergenic region. 



-161- 



CONTIG1991 


CONTIG4923 


CONTIG4986 


CONTIG833 


CONTIG5479 


CONTIG1975 


CONT1G763 


CONTIG5793 


CONTIG2965 


CONTIG3027 


CONTIG4992 


CONTIG3704 


CONTIG3387 


CONTIG2129 


CONTIG4247 


CONTIG5475 


CONTIG5329 


o 

00 
ts> 
00 
LA 

Os 

1 


00 
OO 
tO 
00 

o 
o 

tO 

'oo 


so 

oo 

UJ 

'a 

'as 


SO 
OS 

OS 

os 

UJ 

SO 

l~ 
i~ 


to 

00 
UJ 
UJ 
LA 

o 
( o 

r3 

4^ 


to 
o 

UJ 

so 
Os 
UJ 

ro 
'o 

C 


so 

-o 
to 
la 
O 

UJ 

1 


4^ 
00 

to 

o 
to 

Os 

cr 

UJ 
LA 


26265885_0_2 


Os 
00 

ro 

v© 
O 
LA 
LA 

1 

O 
UJ 

'o. 


25470067_O_5 


ro 

LA 
-O 
OO 
4*. 
LA 

i ro 

1 1 

to 

LA 


OS 
4*> 
VO 
to 
Os 
VO 
to 

1 

o 
to 

•sJ 


21516953_f2_l 


LA 

ro 
-o 

Os 
LA 
OO 

( o 
o 

C 


to 

OS 

o 
o 
-o 

LA 

'o 

UJ 

l_ 

4^ 


480202_c3_10 


tO 
-J 


to 
^1 
o 


ro 

Os 
SO 


ro 

OS 
00 


ro 

OS 


ro 

OS 

Os 


to 
Os 
la 


to 

OS 
4^ 


ro 

Os 
UJ 


to 

Os 

ro 


ro 

os 


to 

Os 

o 


ro 

LA 
VO 


to 

LA 
OO 


to 

LA 
-O 


ro 

1 LA 
OS 


ro 

LA 
LA 


15374 


15373 


15372 


15371 


15370 


15369 


15368 


15367 


15366 


15365 


15364 


15363 


15362 


15361 


15360 


15359 


15358 


SO 

i3 


--4 
4* 
-J 


00 
UJ 


1200 


1035 


-o 
ro 

Os 


Os 
Os 
Os 


1197 


LA 

© 


4*. 


2769 


Os 

i 00 
4^ 


1167 


1791 


00 
4^ 

SO 


Os 
UJ 

o 


2961 


UJ 

o 


ro 

4*. 
SO 


os 


4^ 
O 

o 


UJ 

4*. 
la 


to 

4^ 

to 


ZZZ 


UJ 
VO 
SO 


OS 

-J 


LA 


VO 
to 

UJ 


to 
ro 
oo 


UJ 

oo 

vO 


LA 
VO 


ro 

00 
UJ 


ro 
o 


VO 
00 

<l 


Q09625 


P43132 


P29953 


P30638 


PI 7778 


P34624 


P53847 


P53850 


P53853 


P53854 


P53855 


P53855 


P53855 


P23503 


P53857 


P53858 


P53858 


© 


N> 

to 


o 
o 


LA 




UJ 

o 


o 
oo 


4* 

4*. 


OS 


UJ 
LA 


o 

VO 


4a. 


LA 
00 
OO 


UJ 

VO 
to 


to 

SO 


to 

LA 
4* 


VO 


0.0061 : 


ro 
Jo 

ro 

00 


La 

© 

la 


4.2(1 0)-7 


0.00119 


5.7(10)-6 


0.00239 


9.4(I0)-9 


7.0(10)- 12 


2.8(1 0)-9 


4s> 

O 
i 

LA 


0.00079 


-o 
b 

s 

1 

LA 
OS 


to 
bo 

© 
i 

Os 
O 


8.6(1 0)-26 


1.2(10)-20 


0.00169 


Caenorhabditis 
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meliloti 
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elegans 


Yersinia pestis 
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elegans 
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cerevisiae 
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hypothetical 84.3 kd protein 
zk945.10 in chromosome ii. 


hypothetical protein in pprl 5'region 
(orfx) (fragment). 


hypothetical 18.2 kd protein in pmi 
5'region (orfl). 


hypothetical 59. 1 kd protein 
zk637.1 in chromosome iii. 


outer membrane protein yopm. 


hypothetical 63.5 kd protein 
zk353.1 in chromosome iii. 


hypothetical 88. 1 kd protein in atx 1 - 
sip3 intergenic region. 


hypothetical 46.2 kd protein in sip3- 
mrp!30 intergenic region. 


hypothetical 30.6 kd protein in 
rpa49-suil intergenic region. 


hypothetical 20.4 kd protein in 
rpa49-suil intergenic region. 


hypothetical 178.4 kd protein in 
sla2-zwfl intergenic region. 


hypothetical 178.4 kd protein in 
sla2-zwfl intergenic region. 


hypothetical 1 78.4 kd protein in 
sla2-zwfl intergenic region. 


hypothetical 54.2 kd protein in 
zwfl-blhl/lap3 intergenic region. 


hypothetical 47.8 kd protein in sin4- 
ure2 intergenic region. 


hypothetical 100.6 kd protein in 
sin4-ure2 intergenic region. 


hypothetical 100.6 kd protein in 
sin4-ure2 intergenic region. 
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Saccharomyces 
cerevisiae 
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cerevisiae 
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cerevisiae 
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cerevisiae 
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cerevisiae 
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elegans 
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Candida albicans 


Drosophila 
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norvegicus 
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hypothetical 18.4 kd protein in 
radl0-prs4 intergenic region. 


hypothetical 58.0 kd protein in 
vanl-datl intergenic region. 


hypothetical 126.1 kd protein in 
ndil-atrl intergenic region. 


hypothetical 57.7 kd protein in ndil- 
atrl intergenic region. 


hypothetical 65.2 kd protein in 
cox14-hmgs intergenic region. 


hypothetical 59.6 kd protein in 
coxl4-hmgs intergenic region. 


hypothetical 59.6 kd protein in 
cox!4-hmgs intergenic region. 


hypothetical 65.0 kd protein in 
cox 14 5'region precursor. 


hypothetical 26.6 kd protein tl9c3.4 
in chromosome iii. 


hypothetical trp-asp repeats 
containing protein in pom 152- 
recl 14 intergenic region. 


hypothetical trp-asp repeats 
containing protein in nupl 16-far3 
intergenic region. 


glycolipid 2-alpha- 
mannosyltransferase (ec 2.4. 1 . 1 3 1 ) 
(alpha- 1 ,2- mannosyltransferase). 


dynein light chain 1, cytoplasmic. 


dynein intermediate chain 2, 
cytosolic (dh ic-2). 

i .,„ , , _. . 


hypothetical 44.9 kd protein in 
uralO-nrcl intergenic region. 
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hypothetical 76. 1 kd protein in 
ungl-psp2 intergenic region. 


hypothetical 18.5 kd protein in 
ndcl-tsal intergenic region. 


hypothetical 17.7 kd protein in 
amdl-rad52 intergenic region. 


hypothetical 74.2 kd protein in 
amdt-rad52 intergenic region. 


hypothetical 74.2 kd protein in 
amdl-rad52 intergenic region. 


hypothetical 20.7 kd protein in cat2- 
amdl intergenic region. 


hypothetical 49.6 kd protein in cat2- 
amdl intergenic region. 


hypothetical 153.8 kd protein in 
ga!80-prp39 intergenic region. 


hypothetical 40.9 kd protein in 
dakl-orcl intergenic region. 


hypothetical 40.7 kd protein in 
dakl-orcl intergenic region. 


hypothetical 40.7 kd protein in 
dakl-orcl intergenic region. 


hypothetical 54.1 kd protein in 
dakl-orcl intergenic region. 


hypothetical 69.8 kd protein in 
y!16a-dakl intergenic region. 


hypothetical 171.1 kd protein in 
yl!6a-dakl intergenic region. 


hypothetical 18.4 kd protein in cpr3- 
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hypothetical 76.9 kd protein in 
rpm2-tubl intergenic region. 


hypothetical 103.0 kd protein in 
radl0-prs4 intergenic region. 
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hypothetical 55.4 kd protein in 
mcml-nupl 16 intergenic region. 


hypothetical 38.2 kd protein in 
subl-argrl intergenic region. 


hypothetical 48.4 kd protein in 
tap42-imp2 intergenic region. 


hypothetical 59.3 kd protein in 
tap42-imp2 intergenic region. 


hypothetical 54. 1 kd protein in 
mrp!3-tap42 intergenic region. 


hypothetical 60. 1 kd protein in 
sec59-erg5 intergenic region. 


hypothetical 145.2 kd protein in 
hxt2-sec59 intergenic region. 


hypothetical 46.9 kd protein in plbl- 
hxt2 intergenic region. 


hypothetical 20.9 kd protein in plbl- 
hxt2 intergenic region. 


hypothetical 16.7 kd protein in 
cdc5-mvpl intergenic region. 


hypothetical 84.6 kd protein in glol- 
ypt7 intergenic region. 


hypothetical 34.0 kd protein in glol- 
ypt7 intergenic region. 


hypothetical 52.7 kd protein in 
pdr4-glol intergenic region. 


hypothetical 66.8 kd protein in 
ppzl-spt5 intergenic region. 


hypothetical 43.7 kd protein in 
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hypothetical 37.9 kd protein in 
ungl-psp2 intergenic region. 


hypothetical 76. 1 kd protein in 
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hypothetical 58.0 kd protein in ilv2- 
adet7 intergenic region. 


hypothetical 15.2 kd protein in ilv2- 
ade!7 intergenic region. 


hypothetical 72.2 kd protein in 
ctf!3-ypk2 intergenic region. 


hypothetical 34.0 kd protein in 
ctf!3-ypk2 intergenic region. 


hypothetical 70.4 kd protein in 
ctf!3-ypk2 intergenic region. 


hypothetical 42. 1 kd protein in 
ctf!3-ypk2 intergenic region. 


hypothetical 57.7 kd protein in aipl- 
ctfl3 intergenic region. 


hypothetical 57.7 kd protein in aipl- 
ctf!3 intergenic region. 


hypothetical 57.7 kd protein in aipl- 
ctfl3 intergenic region. 


hypothetical 147.0 kd protein in 
abf2-ch!12 intergenic region. 


hypothetical 147.0 kd protein in 
abf2-ch!12 intergenic region. 


hypothetical 147.0 kd protein in 
abf2-chll2 intergenic region. 


hypothetical 78.8 kd protein in abf2- 
chl 1 2 intergenic region. 


hypothetical 18.7 kd protein in 
hmsl-abf2 intergenic region. 


hypothetical 36.4 kd protein in 
nupl 16-far3 intergenic region. 


hypothetical 55.4 kd protein in 
mcml-nupl 16 intergenic region. 


hypothetical 55.4 kd protein in 
mcml-nupl 16 intergenic region. 
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hypothetical 52.2 kd protein in rarl- 
scjl intergenic region. 


hypothetical 47.3 kd protein in 
tom40-pfk2 intergenic region. 


hypothetical 28.9 kd protein in clnl- 
radl4 intergenic region. 


hypothetical 126.6 kd protein in 
rp!39-cikl intergenic region. 


hypothetical 1 26.6 kd protein in 
rp!39-cikl intergenic region. 


hypothetical 1 13.2 kd protein in 
sso2-hsc82 intergenic region. 
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hypothetical 31.1 kd protein in 
sip.I8-spt2I intergenic region. 


hypothetical 56,2 kd protein in 
sip!8-spt21 intergenic region. 


hypothetical 162.7 kd protein in 
sipl 8-spt2 1 intergenic region. 


hypothetical 62.5 kd protein in ald5- 
ddr48 intergenic region. 


hypothetical 17.5 kd protein in 
impl-hljl intergenic region. 


hypothetical 17.5 kd protein in 
impl-hljl intergenic region. 


hypothetical 29. 1 kd protein in 
impt-hljl intergenic region. 


hypothetical 60.0 kd protein in 
impl-hljl intergenic region. 


hypothetical 60.0 kd protein in 
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L. 


hypothetical 35.3 kd protein in 
pom!52-recl 14 intergenic region. 
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fet4 intergenic region. 
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glc8 intergenic region. 
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jnml-lcbl intergenic region. 
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jnml-lcbl intergenic region. 
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msul-jnml intergenic region. 
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dsk2-cat8 intergenic region. 


hypothetical 16.2 kd protein in 
prp24-rrn9 intergenic region. 
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[ui:mth649] [pnxonserved protein] 
[gtcfc: 14. 1:14.2] [keggfc: 1 4.2] 
[genomfc: 14.2] [db:gtc- 
methanobacterium 
thermoautotroph icum] 


[ui:mth232] [pnxonserved protein] 
[gtcfc: 14. 1:14.2] [keggfc: 14.2] 
[genomfc: 14.2] [db:gtc- 
methanobacterium 
thermoautotrophicum] 


[ui:mthl621] [pn:gtp-binding 

protein, gtpl/obg family] 

[gtcfc: 1 4. 1 : 1 4.2: 1 4.3] [keggfc: 1 4.2] 

[genomfc: 1 3 .7] [db.gtc- 

methanobacterium 

thermoautotrophicum] 


[ui:mthl280] [pnrpetl 12-like 
protein] [gtcfc: 14. 1:14.2: 14.3] 
[keggfc: 14.2] [genomfc: 13.7] 
[db:gtc-methanobacterium 
thermoautotrophicum] 


[ui:mthl005] [pnxonserved protein] 
[gtcfc: 14. 1:14.2] [keggfc: 14.2] 
[genomfc: 14.2] [db.gtc- 
methanobacterium 
thermoautotrophicum] 


[ui:mth875] [pn:3-chlorobenzoate- 
3,4-dioxygenase dyhydrogenase 
related protein] [gtcfc: 14.1:1 4.3] 
[ec: 1.1.1.18] [keggfc: 14.1] 
[genomfc: 1 3.7] [db.gtc- 
methanobacterium 
thermoautotrophicum] 
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